{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "036b2acf",
   "metadata": {},
   "source": [
    "<h1>Analysis code for manuscript \"Livestock exposure, seasonal diet shifts, host individual, and time correlate with wild African savanna elephant gut microbiome diversity\" <h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c7e2a55",
   "metadata": {},
   "source": [
    "<h3>Import qiime2<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b270990f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import qiime2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c45e62be",
   "metadata": {},
   "source": [
    "<h2>Import data<h2>\n",
    "    \n",
    "<h3>Raw demultiplexed data available at \n",
    "    https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/R7RECE)<h3>\n",
    "    \n",
    "<h3>Data also available in the NCBI SRA respository, BioProject ID: PRJNA1127764,\n",
    "    http://www.ncbi.nlm.nih.gov/bioproject/1127764! <h3>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b50620a8",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "54e236a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime tools import \\\n",
    "#   --type 'SampleData[PairedEndSequencesWithQuality]' \\\n",
    "#   --input-path /Volumes/JMP/Microbiome/S2.1 \\\n",
    "#   --input-format CasavaOneEightSingleLanePerSampleDirFmt \\\n",
    "#   --output-path 2.1_demux_paired_end.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55a5d146",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ebc1567d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime tools import \\\n",
    "#   --type 'SampleData[PairedEndSequencesWithQuality]' \\\n",
    "#   --input-path /Volumes/JMP/Microbiome/S2.2 \\\n",
    "#   --input-format CasavaOneEightSingleLanePerSampleDirFmt \\\n",
    "#   --output-path 2.2_demux_paired_end.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac05491b",
   "metadata": {},
   "source": [
    "<h2>Make sure there are no adapters/primers with cutadapt<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a68dbac6",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "fe8784d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime cutadapt trim-paired \\\n",
    "# --i-demultiplexed-sequences 2.1_demux_paired_end.qza \\\n",
    "# --o-trimmed-sequences 21_trimmed.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75c36dee",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a5e5ebc4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime cutadapt trim-paired \\\n",
    "# --i-demultiplexed-sequences 2.2_demux_paired_end.qza \\\n",
    "# --o-trimmed-sequences 22_trimmed.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7781808",
   "metadata": {},
   "source": [
    "<h2>Make.qza demultiplexed paired-end data files viewable as .qzv files<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae61b150",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "871d497d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime demux summarize \\\n",
    "#   --i-data 21_trimmed.qza \\\n",
    "#   --o-visualization 21_trimmed.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28c732a0",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f09b8bd4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime demux summarize \\\n",
    "#   --i-data 22_trimmed.qza \\\n",
    "#   --o-visualization 22_trimmed.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c41c9447",
   "metadata": {},
   "source": [
    "<h2>Import necessary tools to view visualizations in api language<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "c334ffdb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install visualization\n",
    "# !jupyter serverextension enable --py qiime2 --sys-prefix\n",
    "from qiime2 import Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dea8c88f",
   "metadata": {},
   "source": [
    "<h2>Make original tables, prior to quality control (so no trimming/truncating of reads added)<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa36b46d",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "4c350978",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dada2 denoise-paired \\\n",
    "#   --i-demultiplexed-seqs 21_trimmed.qza \\\n",
    "#   --p-trim-left-f 0 \\\n",
    "#   --p-trim-left-r 0 \\\n",
    "#   --p-trunc-len-f 251 \\\n",
    "#   --p-trunc-len-r 251 \\\n",
    "#   --o-table 21_trimmed_table_before_qc.qza \\\n",
    "#   --o-representative-sequences 21_trimmed_rep_seqs_before_qc.qza \\\n",
    "#   --o-denoising-stats 21_trimmed_denoising_stats_before_qc.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ab09a71",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "cc5f9ada",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dada2 denoise-paired \\\n",
    "#   --i-demultiplexed-seqs 22_trimmed.qza \\\n",
    "#   --p-trim-left-f 0 \\\n",
    "#   --p-trim-left-r 0 \\\n",
    "#   --p-trunc-len-f 251 \\\n",
    "#   --p-trunc-len-r 251 \\\n",
    "#   --o-table 22_trimmed_table_before_qc.qza \\\n",
    "#   --o-representative-sequences 22_trimmed_rep_seqs_before_qc.qza \\\n",
    "#   --o-denoising-stats 22_trimmed_denoising_stats_before_qc.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6094604",
   "metadata": {},
   "source": [
    "<h2>Merge tables to get original read count prior to quality control<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "912e5e0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table merge \\\n",
    "#  --i-tables 21_trimmed_table_before_qc.qza \\\n",
    "#  --i-tables 22_trimmed_table_before_qc.qza \\\n",
    "#  --p-overlap-method sum \\\n",
    "#  --o-merged-table merged_table_before_qc.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfd5867d",
   "metadata": {},
   "source": [
    "<h2> Make visualizations of table, rep seqs, and denoising stats of merged tables prior to quality control <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "27cf6eef",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table merged_table_before_qc.qza \\\n",
    "#   --o-visualization merged_table_before_qc.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fa3d779",
   "metadata": {},
   "source": [
    "<h2> View merged table to report number of reads prior to quality control <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c6a1540d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/c9be5670-68f6-4109-95c2-ac08d12c4596')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: c9be5670-68f6-4109-95c2-ac08d12c4596>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_before_qc = Visualization.load('merged_table_before_qc.qzv')\n",
    "viz_before_qc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd426f99",
   "metadata": {},
   "source": [
    "<h3> Alternatively, if the above is not working, you can drag and drop the merged_table_before_qc.qzv available at https://doi.org/10.7910/DVN/WPVNU4 into the visualization tool at the following website: https://view.qiime2.org/. This can be done with any of the .qzv files. <h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41de4fc8",
   "metadata": {},
   "source": [
    "<h2> Then, view separate .qzv files created further up, one at a time to decide where to trim and truncate reads. <h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfbc2ce1",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c01389ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/ecc8f163-4b33-4898-a03a-83d66884702a')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: ecc8f163-4b33-4898-a03a-83d66884702a>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_21 = Visualization.load('21_trimmed.qzv')\n",
    "viz_21"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "853a0c98",
   "metadata": {},
   "source": [
    "<h4>Based on the visualization, we will not trim forward reads and truncate them at position 225. We will trim reverse reads at 1 and truncate at 186.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "52d8d897",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dada2 denoise-paired \\\n",
    "#   --i-demultiplexed-seqs 21_trimmed.qza \\\n",
    "#   --p-trim-left-f 0 \\\n",
    "#   --p-trim-left-r 1 \\\n",
    "#   --p-trunc-len-f 225 \\\n",
    "#   --p-trunc-len-r 186 \\\n",
    "#   --o-table 21_table.qza \\\n",
    "#   --o-representative-sequences 21_rep_seqs.qza \\\n",
    "#   --o-denoising-stats 21_denoising_stats.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61e4f1df",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "c1486775",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/f0cb2758-2675-476e-8256-6b7412563437')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: f0cb2758-2675-476e-8256-6b7412563437>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_22 = Visualization.load('22_trimmed.qzv')\n",
    "viz_22"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40c566c0",
   "metadata": {},
   "source": [
    "<h4>Based on the visualization, we will not trim forward reads and truncate them at position 177. We will trim reverse reads at 11 and truncate at 165.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "8785fcea",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dada2 denoise-paired \\\n",
    "#   --i-demultiplexed-seqs 22_trimmed.qza \\\n",
    "#   --p-trim-left-f 0 \\\n",
    "#   --p-trim-left-r 11 \\\n",
    "#   --p-trunc-len-f 177 \\\n",
    "#   --p-trunc-len-r 165 \\\n",
    "#   --o-table 22_table.qza \\\n",
    "#   --o-representative-sequences 22_rep_seqs.qza \\\n",
    "#   --o-denoising-stats 22_denoising_stats.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "783ecab4",
   "metadata": {},
   "source": [
    "<h2>Create visualizations for resulting feature tables, representative sequence files, and denoising statistics<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c21f055d",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "4621599c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table 21_table.qza \\\n",
    "#   --o-visualization 21_table.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv\n",
    "\n",
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data 21_rep_seqs.qza \\\n",
    "#   --o-visualization 21_rep_seqs.qzv\n",
    "\n",
    "# !qiime metadata tabulate \\\n",
    "#   --m-input-file 21_denoising_stats.qza \\\n",
    "#   --o-visualization 21_denoising_stats.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1cd4bc1",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "228f6a33",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table 22_table.qza \\\n",
    "#   --o-visualization 22_table.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv\n",
    "\n",
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data 22_rep_seqs.qza \\\n",
    "#   --o-visualization 22_rep_seqs.qzv\n",
    "\n",
    "# !qiime metadata tabulate \\\n",
    "#   --m-input-file 22_denoising_stats.qza \\\n",
    "#   --o-visualization 22_denoising_stats.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cddd654",
   "metadata": {},
   "source": [
    "<h2>View resulting .qzv files<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e5d7e8e",
   "metadata": {},
   "source": [
    "<h4>Run 2.1<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b229565",
   "metadata": {},
   "source": [
    "<h5>Feature table<h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d46d5b5b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/6ecc62c2-5949-478b-8b20-93506242a734')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 6ecc62c2-5949-478b-8b20-93506242a734>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_21_table = Visualization.load('21_table.qzv')\n",
    "viz_21_table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6352bd5f",
   "metadata": {},
   "source": [
    "<h5>Representative sequences<h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "a768dd2a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/5cae4c52-84fe-4b82-9dcc-d459bb9d11d5')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 5cae4c52-84fe-4b82-9dcc-d459bb9d11d5>"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_21_rep_seqs = Visualization.load('21_rep_seqs.qzv')\n",
    "viz_21_rep_seqs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af92504c",
   "metadata": {},
   "source": [
    "<h5>Denoising statistics<h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "a5bef9f9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/f765bd13-0815-4b46-9ee8-6d15fa3360f7')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: f765bd13-0815-4b46-9ee8-6d15fa3360f7>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_21_dn = Visualization.load('21_denoising_stats.qzv')\n",
    "viz_21_dn"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b75ffd2a",
   "metadata": {},
   "source": [
    "<h4>Run 2.2<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63a3a3cf",
   "metadata": {},
   "source": [
    "<h5>Feature table<h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "abbbbf2f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/1630a7d9-0fb9-4cf6-ba77-31acc9d04a4f')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 1630a7d9-0fb9-4cf6-ba77-31acc9d04a4f>"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_22_table = Visualization.load('22_table.qzv')\n",
    "viz_22_table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d22be1d6",
   "metadata": {},
   "source": [
    "<h4>Representative sequences<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "4294031e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/e3df1001-9cea-4639-b8eb-e1668abf636f')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: e3df1001-9cea-4639-b8eb-e1668abf636f>"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_22_rep_seqs = Visualization.load('22_rep_seqs.qzv')\n",
    "viz_22_rep_seqs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e80cee26",
   "metadata": {},
   "source": [
    "<h5>Denoising statistics<h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "03c7839a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/74690b56-4abd-4f67-970d-19117cd9742c')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 74690b56-4abd-4f67-970d-19117cd9742c>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_22_dn = Visualization.load('22_denoising_stats.qzv')\n",
    "viz_22_dn"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7ead390",
   "metadata": {},
   "source": [
    "<h2>Merge feature tables<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "debd2a02",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table merge \\\n",
    "#  --i-tables 21_table.qza \\\n",
    "#  --i-tables 22_table.qza \\\n",
    "#  --p-overlap-method sum \\\n",
    "#  --o-merged-table table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc409bd1",
   "metadata": {},
   "source": [
    "<h2>Merge representative sequences<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "7fcd3b0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table merge-seqs \\\n",
    "#  --i-data 21_rep_seqs.qza \\\n",
    "#  --i-data 22_rep_seqs.qza \\\n",
    "#  --o-merged-data rep_seqs.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0d55487",
   "metadata": {},
   "source": [
    "<h2>Create visualizations of merged (\"master\") table and representative sequence files<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "f7500a99",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table table.qza \\\n",
    "#   --o-visualization table.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv\n",
    "\n",
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data rep_seqs.qza \\\n",
    "#   --o-visualization rep_seqs.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9d83572",
   "metadata": {},
   "source": [
    "<h2>View master table<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "b900b774",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/47f56d85-434e-4c2e-9785-687530a463f7')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 47f56d85-434e-4c2e-9785-687530a463f7>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_t = Visualization.load('table.qzv')\n",
    "viz_t"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "949d12c9",
   "metadata": {},
   "source": [
    "<h2>View representative sequence file<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "def0c505",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/31deec14-63b3-4441-888d-241c80ff51bf')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 31deec14-63b3-4441-888d-241c80ff51bf>"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_rs = Visualization.load('rep_seqs.qzv')\n",
    "viz_rs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11ed7445",
   "metadata": {},
   "source": [
    "<h2>Subtract sample \"doubles\", or samples from the same individual on the same date<h2>\n",
    "<h3>These were originally intended to test sample degradation through freezing at different times (one sitting out for a week), but we did not end up with enough sets of doubles.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "48c23a77",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-samples \\\n",
    "#   --i-table table.qza \\\n",
    "#   --m-metadata-file metadata.tsv \\\n",
    "#   --p-where \"[Elephant_ID] IN ('M26.05.2','M9.02.2','R17.04.2','R17.08.2','R19.8801.2','R21.08.2','R26.00.2','R29.03.2','R7.9203.2','S92.06.2','Zodiacs.B.2')\" \\\n",
    "#   --p-exclude-ids True \\\n",
    "#   --o-filtered-table filtered_table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96b3ec78",
   "metadata": {},
   "source": [
    "<h2>Create visualization to check success of filtering out sample doubles<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "c4204516",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table filtered_table.qza \\\n",
    "#   --o-visualization filtered_table.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af739930",
   "metadata": {},
   "source": [
    "<h2>View<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "5a0d4732",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/2dbd6e2f-eb74-48a7-9ffd-21d26a980447')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 2dbd6e2f-eb74-48a7-9ffd-21d26a980447>"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_ft = Visualization.load('filtered_table.qzv')\n",
    "viz_ft"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fc86fd6",
   "metadata": {},
   "source": [
    "<h2>Next filter out potentially problematic samples, as determined by field and post-weighing notes (notes included in metadata available at https://doi.org/10.7910/DVN/R7RECE). Descriptions of why each was discarded also included below.<h2> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fdca28b",
   "metadata": {},
   "source": [
    "<h6>WildEle-10 sample got warm and could have misrepresentative microbes<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7319a233",
   "metadata": {},
   "source": [
    "<h6>WildEle-036 same as above.<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a612ab0",
   "metadata": {},
   "source": [
    "<h6>WildEle-049 information could be mixed with WildEle-160<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29438159",
   "metadata": {},
   "source": [
    "<h6>WildEle-083 may have been left out of freezer, and therefore may not be representative<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "315c4ccf",
   "metadata": {},
   "source": [
    "<h6>WildEle-128 sample got warm and could have misrepresentative microbes<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3363e68d",
   "metadata": {},
   "source": [
    "<h6>WildEle-140 same as above<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8fc1c16",
   "metadata": {},
   "source": [
    "<h6>WildEle-147 duplicate of another sample - either this or WildEle-165 should be R36.06, but not sure which, so will drop<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38d6afe0",
   "metadata": {},
   "source": [
    "<h6>WildEle-160 information could be mixed with WildEle-049<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06299160",
   "metadata": {},
   "source": [
    "<h6>WildEle-163 information could be mixed with another cell - same elephant but date would affect NDVI/diet results<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fbd575e5",
   "metadata": {},
   "source": [
    "<h6>WildEle-165 duplicate of another sample - either this or WildEle-147 should be R36.06, but not sure which, so will drop<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6e2f9ba",
   "metadata": {},
   "source": [
    "<h6>WildEle-191 information here and WildEle-315 could be mixed, and as the ages are vastly different, need to drop<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7aaf7b43",
   "metadata": {},
   "source": [
    "<h6>WildEle-193 may have been mistakingly frozen in ethanol, and as all other samples were frozen but not in ethanol, dropping for consistency<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2f473c8",
   "metadata": {},
   "source": [
    "<h6>WildEle-238 sample got warm and could have misrepresentative microbes<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cde7a96a",
   "metadata": {},
   "source": [
    "<h6>WildEle-272 may have been mistakingly frozen in ethanol, and as all other samples were frozen but not in ethanol, dropping for consistency<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aef32f0b",
   "metadata": {},
   "source": [
    "<h6>WildEle-315 information here and WildEle-191 could be mixed, and as the ages are vastly different, need to drop<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d9f6831",
   "metadata": {},
   "source": [
    "<h6>WildEle-347 may have been left out of freezer, and therefore may not be representative<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "730a33e6",
   "metadata": {},
   "source": [
    "<h6>WildEle-364 sample got warm and could have misrepresentative microbes<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2eaf502a",
   "metadata": {},
   "source": [
    "<h6>Not dropping WildEle-067, as I am 95% confident the bull has been correctly assigned<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5af53c16",
   "metadata": {},
   "source": [
    "<h6>Not dropping WildEle-254 and WildEle-280 because they are from the same elephant and dates are not far apart, meaning NDVI/diet differences should not be an issue<h6>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edb60da4",
   "metadata": {},
   "source": [
    "<h2>New feature table without the removed samples will be called \"filtered_table_2\"<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "d08ac314",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-samples \\\n",
    "#   --i-table filtered_table.qza \\\n",
    "#   --m-metadata-file metadata.tsv \\\n",
    "#   --p-where \"[#SampleID] IN ('WildEle-010','WildEle-036','WildEle-049','WildEle-083','WildEle-128','WildEle-140','WildEle-147','WildEle-160','WildEle-163','WildEle-165','WildEle-191','WildEle-193','WildEle-238','WildEle-272','WildEle-315','WildEle-347','WildEle-364')\" \\\n",
    "#   --p-exclude-ids True \\\n",
    "#   --o-filtered-table filtered_table_2.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e01bdfa",
   "metadata": {},
   "source": [
    "<h2>Create visualization to check success of filtering out potentially problematic samples<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "1cf72e4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table filtered_table_2.qza \\\n",
    "#   --o-visualization filtered_table_2.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60df0a19",
   "metadata": {},
   "source": [
    "<h2>View<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "bafc060a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/75f90f95-2ff5-4f38-9e85-2d051be937e7')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 75f90f95-2ff5-4f38-9e85-2d051be937e7>"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_ft2 = Visualization.load('filtered_table_2.qzv')\n",
    "viz_ft2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5714beac",
   "metadata": {},
   "source": [
    "<h2>Create representative sequences for filtered table 2 (no double samples, no potentially problematic samples)<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "75d71f5e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-seqs \\\n",
    "#     --i-data rep_seqs.qza \\\n",
    "#     --i-table filtered_table_2.qza \\\n",
    "#     --o-filtered-data filtered_rep_seqs_2.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd1ff36d",
   "metadata": {},
   "source": [
    "<h2>Create visualization to check filtered representative sequences<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "cc85bb2e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data filtered_rep_seqs_2.qza\\\n",
    "#   --o-visualization filtered_rep_seqs_2.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd68f91c",
   "metadata": {},
   "source": [
    "<h2>View<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "c76954a0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/7ad27261-8ba3-4655-83ee-190c5cdde02e')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 7ad27261-8ba3-4655-83ee-190c5cdde02e>"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_frs2 = Visualization.load('filtered_rep_seqs_2.qzv')\n",
    "viz_frs2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3789839",
   "metadata": {},
   "source": [
    "<h2>Assign taxonomy<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "9ad9a7c9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !qiime feature-classifier classify-sklearn  \\\n",
    "#   --i-classifier silva-138-99-515-806-nb-classifier.qza  \\\n",
    "#   --i-reads filtered_rep_seqs_2.qza  \\\n",
    "#   --o-classification taxonomy.qza --p-n-jobs 4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14b58851",
   "metadata": {},
   "source": [
    "<h2>Remove sequences without a phylum level classification, as well as sequences belonging to mitochondria or chloroplasts. Resulting feature table will be called \"filtered_table_3\"<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "4a7b09c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime taxa filter-table \\\n",
    "#   --i-table filtered_table_2.qza \\\n",
    "#   --i-taxonomy taxonomy.qza \\\n",
    "#   --p-include p__ \\\n",
    "#   --p-exclude mitochondria,chloroplast \\\n",
    "#   --o-filtered-table filtered_table_3.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d44ebfa3",
   "metadata": {},
   "source": [
    "<h2>Create corresponding representative sequences for filtered_table_3<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "d192bb03",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-seqs \\\n",
    "#     --i-data filtered_rep_seqs_2.qza \\\n",
    "#     --i-table filtered_table_3.qza \\\n",
    "#     --o-filtered-data filtered_rep_seqs_3.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbf0a96b",
   "metadata": {},
   "source": [
    "<h2>Create visualizations to check feature table and representative sequences<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "c7453ee9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table filtered_table_3.qza \\\n",
    "#   --o-visualization filtered_table_3.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv\n",
    "\n",
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data filtered_rep_seqs_3.qza \\\n",
    "#   --o-visualization filtered_rep_seqs_3.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "448d2bd0",
   "metadata": {},
   "source": [
    "<h2>View filtered_table_3 (no sample doubles, no potentially problematic samples, no sequences without a phylum classification, no sequences from mitochondria and chloroplasts)<h2>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "9c9dff01",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/55e972ee-6c36-4736-aa46-bec8aae5363b')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: 55e972ee-6c36-4736-aa46-bec8aae5363b>"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_ft3 = Visualization.load('filtered_table_3.qzv')\n",
    "viz_ft3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b04e8bbf",
   "metadata": {},
   "source": [
    "<h2>View corresponding representative sequences, filtered_rep_seqs_3 (no sample doubles, no potentially problematic samples, no sequences without a phylum classification, no mitochondria and chloroplasts)<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "4c05f48b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/c88ea7b8-810b-4c02-a81c-366ca5e796b4')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: c88ea7b8-810b-4c02-a81c-366ca5e796b4>"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_frs3 = Visualization.load('filtered_rep_seqs_3.qzv')\n",
    "viz_frs3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91abd2f6",
   "metadata": {},
   "source": [
    "<h2>Cut samples with <1000 reads<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23222f82",
   "metadata": {},
   "source": [
    "<h3>Decision to cut all samples with reads <1000, because going higher cuts out too many samples. Kept an eye out for weird samples in plots to drop if they were those with low reads, but no problems detected<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "eb9ddab4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-samples \\\n",
    "#   --i-table filtered_table_3.qza \\\n",
    "#   --p-min-frequency 1000 \\\n",
    "#   --o-filtered-table filtered_table_4.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "95c4e415",
   "metadata": {},
   "source": [
    "<h2>Create visualization to check resulting feature table after cutting samples with <1000 reads, called filtered_table_4<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "2c9fa2c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table filtered_table_4.qza \\\n",
    "#   --o-visualization filtered_table_4.qzv \\\n",
    "#   --m-sample-metadata-file metadata.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d010f097",
   "metadata": {},
   "source": [
    "<h2>View filtered_table_4 (no sample doubles, no potentially problematic samples, no sequences without a phylum classification, no mitochondria and chloroplasts, no samples with reads less than 1000)<h2>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "d84fb8c8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><img onload=\"(function(div, url){\n",
       "if (typeof require !== 'undefined') {\n",
       "    var baseURL = require.toUrl('').split('/').slice(0, -2).join('/');\n",
       "} else {\n",
       "    var baseURL = JSON.parse(\n",
       "        document.getElementById('jupyter-config-data').innerHTML\n",
       "    ).baseUrl.slice(0, -1);\n",
       "}\n",
       "url = baseURL + url;\n",
       "fetch(url).then(function(res) {\n",
       "    if (res.status === 404) {\n",
       "        div.innerHTML = 'Install QIIME 2 Jupyter extension with:<br />' +\n",
       "                        '<code>jupyter serverextension enable --py qiime2' +\n",
       "                        ' --sys-prefix</code><br />then restart your server.' +\n",
       "                        '<br /><br />(Interactive output not available on ' +\n",
       "                        'static notebook viewer services like nbviewer.)';\n",
       "    } else if (res.status === 409) {\n",
       "        div.innerHTML = 'Visualization no longer in scope. Re-run this cell' +\n",
       "                        ' to see the visualization.';\n",
       "    } else if (res.ok) {\n",
       "        url = res.url;\n",
       "        div.innerHTML = '<iframe src=\\'' + url + '\\' style=\\'' +\n",
       "                        'width: 100%; height: 700px; border: 0;\\'>' +\n",
       "                        '</iframe><hr />Open in a: <a href=\\'' + url + '\\'' +\n",
       "                        ' target=\\'_blank\\'>new window</a>'\n",
       "    } else {\n",
       "        div.innerHTML = 'Something has gone wrong. Check notebook server for' +\n",
       "                        ' errors.';\n",
       "    }\n",
       "});\n",
       "})(this.parentElement, '/qiime2/redirect?location=/var/folders/k4/z__12n0x3y76hjlntc7r2cb00000gp/T/qiime2/jparker/data/b9bf7288-5c47-4c35-a434-242f35ed8ed1')\" src=\"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==\" /></div>"
      ],
      "text/plain": [
       "<visualization: Visualization uuid: b9bf7288-5c47-4c35-a434-242f35ed8ed1>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "viz_ft4 = Visualization.load('filtered_table_4.qzv')\n",
    "viz_ft4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97b4d688",
   "metadata": {},
   "source": [
    "<h2>Headed into downstream analyses with n = 317 samples<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45671351",
   "metadata": {},
   "source": [
    "<h2><font color='green'>What taxa are included in measures of the core African savanna elephant microbiome in Samburu, Kenya, as measured by occurence?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66123513",
   "metadata": {},
   "source": [
    "<h2>Make a feature table including direct taxonomy at each level down to genus<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "b3046920",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime taxa collapse \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --i-taxonomy taxonomy.qza \\\n",
    "# --p-level 2 \\\n",
    "# --o-collapsed-table phylum_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --i-taxonomy taxonomy.qza \\\n",
    "# --p-level 3 \\\n",
    "# --o-collapsed-table class_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --i-taxonomy taxonomy.qza \\\n",
    "# --p-level 4 \\\n",
    "# --o-collapsed-table order_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --i-taxonomy taxonomy.qza \\\n",
    "# --p-level 5 \\\n",
    "# --o-collapsed-table family_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --i-taxonomy taxonomy.qza \\\n",
    "# --p-level 6 \\\n",
    "# --o-collapsed-table genus_collapsed_table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ef56ae6",
   "metadata": {},
   "source": [
    "<h2>Convert biom files to .tsv that can then be converted to .csv's for use in R package CoreMicro<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4b25461",
   "metadata": {},
   "source": [
    "<h3>First unzipped .qza files created above manually on laptop, then ran below code<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00ae82ba",
   "metadata": {},
   "source": [
    "<h4>Phylum<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "8d576f14",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i phylum_collapsed_table_unzipped/data/feature-table.biom -o phylum-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98f3957a",
   "metadata": {},
   "source": [
    "<h4>Class<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "cc3068c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i class_collapsed_table_unzipped/data/feature-table.biom -o class-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc85ca84",
   "metadata": {},
   "source": [
    "<h4>Order<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "43aeee94",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i order_collapsed_table_unzipped/data/feature-table.biom -o order-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6709dc32",
   "metadata": {},
   "source": [
    "<h4>Family<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "30f87c7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i family_collapsed_table_unzipped/data/feature-table.biom -o family-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c15ce832",
   "metadata": {},
   "source": [
    "<h4>Genus<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "01ee5457",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i genus_collapsed_table_unzipped/data/feature-table.biom -o genus-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b53f6c6",
   "metadata": {},
   "source": [
    "<h2>Converted each level to csv for use in R, and deleted top row that said \"constructed from biom file\" in Excel prior to import (CoreMicro occupancy() function does not work otherwise, see below)<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73a42af4",
   "metadata": {},
   "source": [
    "<h2>Install R kernel for mixed effects linear regression and package that allows one to run R code in a python kernel by simply putting a \"%%R\" at the start of the cell<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "5ff4100b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !conda install -c r r-irkernel -y\n",
    "# !pip install rpy2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c81075ff",
   "metadata": {},
   "source": [
    "<h2>Load rpy2 package to use R within Jupyter Notebook<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "5e65d0fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext rpy2.ipython"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94eafce9",
   "metadata": {},
   "source": [
    "<h2>Install devtools package so can fetch CorMicro package, then install CoreMicro package from Custer et al. 2023<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "ba7529dc",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# install.packages(\"devtools\")\n",
    "# library(devtools)\n",
    "\n",
    "# devtools::install_github(\"MayaGans/CoreMicro\")\n",
    "\n",
    "# install.packages(\"remotes\")\n",
    "# remotes::install_github(\"MayaGans/CoreMicro\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3e6e77f",
   "metadata": {},
   "source": [
    "<h3>Code can also be copied and pasted into RStudio if buggy, just remove the \"%R\" or \"%%R\"<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8ea2646",
   "metadata": {},
   "source": [
    "<h2>Load CoreMicro package <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "1a686ab6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(CoreMicro)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4acee67f",
   "metadata": {},
   "source": [
    "<h2> Based on Neu et al. 2021 and Custer et al. 2023, decided to use occurrence-only method. We will start with a threshold of greater than or equal to 70% (the threshold chosen by Thorel et al. 2023) <h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a8f2aea",
   "metadata": {},
   "source": [
    "<h4>Phylum<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "68ebb31e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "table <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/phylum-table.csv\", header=TRUE)\n",
    "occupancy_core(table, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "314ffbc0",
   "metadata": {},
   "source": [
    "<h4>Class<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "2638feec",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "table2 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/class-table.csv\", header=TRUE)\n",
    "occupancy_core(table2, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "830320c1",
   "metadata": {},
   "source": [
    "<h4>Order<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "55d062e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "table3 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/order-table.csv\", header=TRUE)\n",
    "occupancy_core(table3, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94f94cf9",
   "metadata": {},
   "source": [
    "<h4>Family<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "ae5217c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "table4 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/family-table.csv\", header=TRUE)\n",
    "occupancy_core(table4, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4bf1aa6",
   "metadata": {},
   "source": [
    "<h4>Genus<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "c1698c69",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "table5 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/genus-table.csv\", header=TRUE)\n",
    "occupancy_core(table5, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8a05375",
   "metadata": {},
   "source": [
    "<h2>Occupancy only, greater than or equal to 30% to compare with 70% because arbitrary cutoffs can be problematic in characterizing core microbiomes<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1d77174",
   "metadata": {},
   "source": [
    "<h4>Phylum<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "93e22d89",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ef0a988",
   "metadata": {},
   "source": [
    "<h4>Class<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "350f7ad1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table2, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee963316",
   "metadata": {},
   "source": [
    "<h4>Order<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "6b858eb5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table3, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33e4bf4d",
   "metadata": {},
   "source": [
    "<h4>Family<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "787f7185",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table4, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "42ea33b9",
   "metadata": {},
   "source": [
    "<h4>Genus<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "dd37a2af",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table5, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d510f1cd",
   "metadata": {},
   "source": [
    "<h2><font color='green'>How does the core microbiome identified in our study to the core microbiome identified in another study of wild African elephants in Kenya?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2251139",
   "metadata": {},
   "source": [
    "<h2>Fetch and put fastq files from Budd et al. 2020 through the same process as our samples<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c3125c0",
   "metadata": {},
   "source": [
    "<h3>Install q2-fondue to retrieve fastq files from NCBI<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "id": "c7913e41",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !conda install -y \\\n",
    "#    -c https://packages.qiime2.org/qiime2/2023.2/tested/ \\\n",
    "#    -c conda-forge -c bioconda -c defaults \\\n",
    "#    q2-fondue"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "258b769d",
   "metadata": {},
   "source": [
    "<h4>Refresh cache and check functionality<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "6b64a1b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dev refresh-cache\n",
    "# !qiime fondue --help"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28ccdebb",
   "metadata": {},
   "source": [
    "<h4>Make sure SRA wrap is configured in system<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "91f15744",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !vdb-config -i"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21aca0bf",
   "metadata": {},
   "source": [
    "<h4>Exit out of interface<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "id": "a576e108",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !x"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09d9ee55",
   "metadata": {},
   "source": [
    "<h3>Import .tsv with list of the SRA files to extract and convert into a .qza file for qiime2 to read<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "41ccb4e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime tools import \\\n",
    "#       --type NCBIAccessionIDs \\\n",
    "#       --input-path Budd_SRA_ids.tsv \\\n",
    "#       --output-path Budd_SRA_ids.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37cdee4e",
   "metadata": {},
   "source": [
    "<h3>Retrieve SRA files<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "e77c43ec",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !qiime fondue get-all \\\n",
    "#       --i-accession-ids Budd_SRA_ids.qza \\\n",
    "#       --p-email jennaparker13@gmail.com \\\n",
    "#       --p-retries 3 \\\n",
    "#       --verbose \\\n",
    "#       --output-dir Budd-fondue-output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f6b786b",
   "metadata": {},
   "source": [
    "<h3>Make sure there are no adapters/primers in Budd et al.'s files with cutadapt<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "5b391bdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime cutadapt trim-paired \\\n",
    "# --i-demultiplexed-sequences Budd-fondue-output/paired_reads.qza \\\n",
    "# --o-trimmed-sequences Budd_trimmed.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e49a1693",
   "metadata": {},
   "source": [
    "<h3>Make Budd_trimmed.qza viewable as .qzv file<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "b82a7fd3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime demux summarize \\\n",
    "#   --i-data Budd_trimmed.qza \\\n",
    "#   --o-visualization Budd_trimmed.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c602f78",
   "metadata": {},
   "source": [
    "<h3>View to decide where to trim and truncate reads<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "id": "f5749786",
   "metadata": {},
   "outputs": [],
   "source": [
    "viz_Budd_trimmed = Visualization.load('Budd_trimmed.qzv')\n",
    "viz_Budd_trimmed"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ba21d9a",
   "metadata": {},
   "source": [
    "<h4>Based on the visualization, we will not trim forward reads and truncate them at position 248. We will not trim reverse reads and truncate at 236.<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9336928b",
   "metadata": {},
   "source": [
    "<h3>The following cell of code had to be run on a supercomputer due to the large size of Budd_trimmed.qza.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "id": "2cb33c21",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime dada2 denoise-paired \\\n",
    "#   --i-demultiplexed-seqs Budd_trimmed.qza \\\n",
    "#   --p-trim-left-f 0 \\\n",
    "#   --p-trim-left-r 0 \\\n",
    "#   --p-trunc-len-f 248 \\\n",
    "#   --p-trunc-len-r 236 \\\n",
    "#   --verbose \\\n",
    "#   --o-table Budd_table.qza \\\n",
    "#   --o-representative-sequences Budd_rep_seqs.qza \\\n",
    "#   --o-denoising-stats Budd_denoising_stats.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "169e8fd8",
   "metadata": {},
   "source": [
    "<h3>Make Budd feature table and rep sequences visualization files<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "6f15a670",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table summarize \\\n",
    "#   --i-table Budd_table.qza \\\n",
    "#   --o-visualization Budd_table.qzv \\\n",
    "#   --m-sample-metadata-file Budd-fondue-output/unzipped/data/sra-metadata.tsv\n",
    "\n",
    "# !qiime feature-table tabulate-seqs \\\n",
    "#   --i-data Budd_rep_seqs.qza \\\n",
    "#   --o-visualization Budd_rep_seqs.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a299c34",
   "metadata": {},
   "source": [
    "<h3>Visualize Budd et al. feature table<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "75cc1cc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "viz_Budd_table = Visualization.load('Budd_table.qzv')\n",
    "viz_Budd_table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b9228be",
   "metadata": {},
   "source": [
    "<h3>Visualize Budd et al. rep sequences<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "df6098cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "viz_Budd_seqs = Visualization.load('Budd_rep_seqs.qzv')\n",
    "viz_Budd_seqs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b1e63c0",
   "metadata": {},
   "source": [
    "<h2>None of Budd et al. samples had <1000 reads, so no removal necessary<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76d5ea83",
   "metadata": {},
   "source": [
    "<h2>Assign taxonomy - this step also had to be run on a supercomputer<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "19a35bf2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-classifier classify-sklearn  \\\n",
    "#   --i-classifier silva-138-99-515-806-nb-classifier.qza  \\\n",
    "#   --i-reads Budd_rep_seqs.qza  \\\n",
    "#   --verbose \\\n",
    "#   --o-classification Budd_taxonomy.qza --p-n-jobs 4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30558c3b",
   "metadata": {},
   "source": [
    "<h2>Remove sequences with no phylum-level classification, mitochondria, and chloroplasts (supercomputer)<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "2e4f4eb4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime taxa filter-table \\\n",
    "#   --i-table Budd_table.qza \\\n",
    "#   --i-taxonomy Budd_taxonomy.qza \\\n",
    "#   --p-include p__ \\\n",
    "#   --p-exclude mitochondria,chloroplast \\\n",
    "#   --o-filtered-table Budd_table_final.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "247651df",
   "metadata": {},
   "source": [
    "<h2> Determine core microbiome for Budd et al.'s unrarefied samples at greater than or equal to 70% occurence <h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31e8056a",
   "metadata": {},
   "source": [
    "<h3>Makes file for use in R's CoreMicro<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "id": "731c04c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime taxa collapse \\\n",
    "# --i-table Budd_table_final.qza \\\n",
    "# --i-taxonomy Budd_taxonomy.qza \\\n",
    "# --p-level 2 \\\n",
    "# --o-collapsed-table Budd_phylum_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table Budd_table_final.qza \\\n",
    "# --i-taxonomy Budd_taxonomy.qza \\\n",
    "# --p-level 3 \\\n",
    "# --o-collapsed-table Budd_class_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table Budd_table_final.qza \\\n",
    "# --i-taxonomy Budd_taxonomy.qza \\\n",
    "# --p-level 4 \\\n",
    "# --o-collapsed-table Budd_order_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table Budd_table_final.qza \\\n",
    "# --i-taxonomy Budd_taxonomy.qza \\\n",
    "# --p-level 5 \\\n",
    "# --o-collapsed-table Budd_family_collapsed_table.qza\n",
    "\n",
    "# !qiime taxa collapse \\\n",
    "# --i-table Budd_table_final.qza \\\n",
    "# --i-taxonomy Budd_taxonomy.qza \\\n",
    "# --p-level 6 \\\n",
    "# --o-collapsed-table Budd_genus_collapsed_table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf7541b2",
   "metadata": {},
   "source": [
    "<h2>Convert biom files to .tsv that can then be converted to .csv's for use in R package CoreMicro<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0586fb61",
   "metadata": {},
   "source": [
    "<h3>First unzipped .qza files created above manually on laptop, then ran below code<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ef87e61",
   "metadata": {},
   "source": [
    "<h4>Phylum<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "id": "2cd4b968",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i Budd_phylum_collapsed_table_unzipped/data/feature-table.biom -o Budd_phylum-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50025918",
   "metadata": {},
   "source": [
    "<h4>Class<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "id": "3c7a7258",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i Budd_class_collapsed_table_unzipped/data/feature-table.biom -o Budd_class-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ca51ce9",
   "metadata": {},
   "source": [
    "<h4>Order<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "id": "f6f6965d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i Budd_order_collapsed_table_unzipped/data/feature-table.biom -o Budd_order-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19a52e4c",
   "metadata": {},
   "source": [
    "<h4>Family<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "id": "90443f6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i Budd_family_collapsed_table_unzipped/data/feature-table.biom -o Budd_family-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86cfdd2c",
   "metadata": {},
   "source": [
    "<h4>Genus<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "id": "77300700",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i Budd_genus_collapsed_table_unzipped/data/feature-table.biom -o Budd_genus-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "819ec771",
   "metadata": {},
   "source": [
    "<h2>Converted each level to csv for use in R, and deleted top row that said \"constructed from biom file\" in Excel prior to import (CoreMicro occupancy() function does not work otherwise, see below)<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ab9f602",
   "metadata": {},
   "source": [
    "<h2> CoreMicro occurrence-only greater than or equal to 70% threshold<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "694243c6",
   "metadata": {},
   "source": [
    "<h4>Phylum<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "id": "9b6b7466",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "tableB <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/Budd_phylum-table.csv\", header=TRUE)\n",
    "occupancy_core(tableB, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57f2b32f",
   "metadata": {},
   "source": [
    "<h4><font color='red'>Check for Armatimonadota in greater than or equal to 30% for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "id": "248db70d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB, prop_rep = .3, taxa_as_rows = TRUE) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de739c34",
   "metadata": {},
   "source": [
    "<h4>Class<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "id": "cd1fbb22",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "tableB_2 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/Budd_class-table.csv\", header=TRUE)\n",
    "occupancy_core(tableB_2, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7d157ed",
   "metadata": {},
   "source": [
    "<h4><font color ='red'> Check for uncultured Armatimonadota and Actinobacteria in greater than or equal to 30% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "id": "96cbc7e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_2, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72c9d552",
   "metadata": {},
   "source": [
    "<h4>Order<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "id": "13354060",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "tableB_3 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/Budd_order-table.csv\", header=TRUE)\n",
    "occupancy_core(table3, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "826d097b",
   "metadata": {},
   "source": [
    "<h4><font color = 'red'>Check for uncultured Armatimonadota and Micrococcales in greater than or equal to 30% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "id": "6966f07c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_3, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4faaedd4",
   "metadata": {},
   "source": [
    "<h4><font color ='red'> Check for Acholeplasmatales, Paenibacillales, Rhodospirillales, and vadin BB60 group in greater than or equal to 10% core for our samples</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "id": "04143653",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table3, prop_rep = .1, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fdc5459",
   "metadata": {},
   "source": [
    "<h4>Family<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "id": "a2464e65",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "tableB_4 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/Budd_family-table.csv\", header=TRUE)\n",
    "occupancy_core(tableB_4, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cebb20e",
   "metadata": {},
   "source": [
    "<h4><font color = 'red'>Check for Enterococcaceae, Micrococcaceae, uncultured order Coriobacteriales, and uncultured phylum Armatimonadota in greater than or equal to 30% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "8191b679",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_4, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1cf98be",
   "metadata": {},
   "source": [
    "<h4><font color = 'red'>Check for uncultured order Coriobacteriales in greater than or equal to 10% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "610066b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_4, prop_rep = .1, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31d07658",
   "metadata": {},
   "source": [
    "<h4><font color ='red'> Check for Acholeplasmataceae, Bacteroidales UCG-001, Clostridia vadinBB6 group, Comamonadaceae, Izemoplamataceae, Marinifilaceae,Paenibacillaceae, sutterellaceae, Weeksellaceae, and uncultured Rhodospirillales in greater than or equal to 10% core for our samples</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "667696b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table4, prop_rep = .1, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56f999ac",
   "metadata": {},
   "source": [
    "<h4>Genus<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "1cfd90db",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "tableB_5 <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/Budd_genus-table.csv\", header=TRUE)\n",
    "occupancy_core(tableB_5, prop_rep = .7, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "658945cc",
   "metadata": {},
   "source": [
    "<h4><font color = 'red'>Check for Arthobacter, Cpla-4 termite group, Enterococcus, p-1088-a5 gut group, Phoenicibacter, uncultured Prevotellaceae, uncultured order Coriobacteriales, and uncultured phylum Armatimonadota in greater than or equal to 30% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "id": "a12a26a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_5, prop_rep = .3, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0402427",
   "metadata": {},
   "source": [
    "<h4><font color = 'red'>Check for Arthobacter in greater than or equal to 10% core for Budd</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "id": "599e1f37",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(tableB_5, prop_rep = .1, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61ceda10",
   "metadata": {},
   "source": [
    "<h4><font color ='red'> Check for Alistipes, Agathobacter, Cloacibacillus, Clostridia vadinB60 group, Escheria-Shigella, Izemoplasmataceae, Lysinibacillus, Paenibacillus, Pseudobutyrivibrio, Sutterellaceae, uncultured of family Erysipelatoclostridiaceae, uncultured of family Marinifilaceae, uncultured of family Weeksellaceae, and uncultured of order Rhodospirillales in greater than or equal to 10% core for our samples</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "id": "5285a0e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "occupancy_core(table5, prop_rep = .1, taxa_as_rows = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64e36158",
   "metadata": {},
   "source": [
    "<h4>When we did not find a taxon in at least 10% of samples, checked .csv's with the find function to confirm presence and calculate percentage<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfbdcb98",
   "metadata": {},
   "source": [
    "<h1><font color='red'>Please note that the below q2-longitudinal analysis with respect to time is no longer part of the manuscript in the same way following review. Instead, we included time as a factor in the new Bayesian analysis, and present the below as a follow-up to those findings. </font><h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e008897",
   "metadata": {},
   "source": [
    "<h2><font color='green'>What do individuals' microbiomes look like over time? If they change with time, what is changing the most?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6637b99b",
   "metadata": {},
   "source": [
    "<h2>Use q2-longitudinal plugin to conduct volatility analysis at the level of genus<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac8e241a",
   "metadata": {},
   "source": [
    "<h3>First convert counts to relative frequencies in preparation<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "id": "1af70cbf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table relative-frequency \\\n",
    "#   --i-table genus_collapsed_table.qza \\\n",
    "#   --o-relative-frequency-table genus-rf-table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e98ef44f",
   "metadata": {},
   "source": [
    "<h3>Look at volatility in general first, defined as \"the degree of compositional change over time\" by Bastiaanssen et al. 2021. X-axis is \"State\", which is a number relative to the first day of sampling, with the first day of sampling as \"1\" and each day progressing from there. Before running volatility analysis, removed columns with \"NAs\" from metadata_2.tsv using Excel and Numbers and saved as \"metadata_3.tsv\". The visualization did not work with JSON otherwise. In the \"metadata_3.tsv\" file, we also created a column for which every sample had the value of \"All_samples\" for clearer plotting.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "0fa059d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime longitudinal volatility \\\n",
    "#   --i-table genus-rf-table.qza \\\n",
    "#   --p-state-column State \\\n",
    "#   --m-metadata-file metadata_3.tsv \\\n",
    "#   --p-individual-id-column Elephant_ID \\\n",
    "#   --o-visualization volatility.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29135977",
   "metadata": {},
   "source": [
    "<h3>View volatility plot<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "id": "d452f484",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "viz_vol = Visualization.load('volatility.qzv')\n",
    "viz_vol"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28b0a5d5",
   "metadata": {},
   "source": [
    "<h3>Clearly there is volatility. Look at which 10 features are the most volatile<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "id": "7219d3fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime longitudinal feature-volatility \\\n",
    "#   --i-table genus_collapsed_table.qza  \\\n",
    "#   --m-metadata-file metadata_3.tsv \\\n",
    "#   --p-state-column State \\\n",
    "# --p-individual-id-column Elephant_ID \\\n",
    "# --p-feature-count 10 \\\n",
    "# --output-dir feature-volatility-state"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39991a4c",
   "metadata": {},
   "source": [
    "<h3>View feature volatility plot with respect to state<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "id": "27d12eae",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "viz_fvol_st = Visualization.load('feature-volatility-state/volatility_plot.qzv')\n",
    "viz_fvol_st"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2b004d7",
   "metadata": {},
   "source": [
    "<h2>Make a figure that to more clearly/thoroughly visualize the changes in taxa over time for each individual elephant<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b10b2155",
   "metadata": {},
   "source": [
    "<h3>Used instructions from https://forum.qiime2.org/t/standard-method-to-merge-taxonomy-and-feature-asv-data/23087 to merge taxonomy and feature table for an asv table. This could then be used to make a stacked barplot figure in R<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ede243",
   "metadata": {},
   "source": [
    "<h3> First recreate taxonomy. qza since we had unzipped that file. To do so, first make a rep_seqs file according to our most recent feature table that has only those sequences we need (for streamlining)<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "id": "58e141ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-table filter-seqs \\\n",
    "# --i-data filtered_rep_seqs_3.qza \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --o-filtered-data filtered_rep_seqs_4.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2330a0c",
   "metadata": {},
   "source": [
    "<h4>Now make the taxonomy.qza file<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "id": "5a40783e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime feature-classifier classify-sklearn  \\\n",
    "#   --i-classifier silva-138-99-515-806-nb-classifier.qza  \\\n",
    "#   --i-reads filtered_rep_seqs_4.qza  \\\n",
    "#   --o-classification taxonomy.qza --p-n-jobs 4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75afa9ea",
   "metadata": {},
   "source": [
    "<h4>Then follow the website instructions<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "id": "94330c1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime taxa barplot \\\n",
    "#   --i-table filtered_table_4.qza \\\n",
    "#   --i-taxonomy taxonomy.qza \\\n",
    "#   --m-metadata-file metadata_2.tsv \\\n",
    "#   --o-visualization bar_plots.qzv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20955244",
   "metadata": {},
   "source": [
    "<h3>View and extract csv's by setting desired taxonomic level and then clicking \"CSV\"<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "id": "a8bf2304",
   "metadata": {},
   "outputs": [],
   "source": [
    "viz_bars = Visualization.load('bar_plots.qzv')\n",
    "viz_bars"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70d6508a",
   "metadata": {},
   "source": [
    "<h3>Need a relative frequency table<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "012fdebc",
   "metadata": {},
   "source": [
    "<h4>Want to make the figure at the phylum level, so use collapsed table at the phylum level from core microbiome analysis above<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "id": "b727620e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !qiime feature-table relative-frequency \\\n",
    "# --i-table phylum_collapsed_table.qza \\\n",
    "# --o-relative-frequency-table rel-phyla-table.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17a0dcd8",
   "metadata": {},
   "source": [
    "<h3>Prepare resulting relative frequency table for use in R<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 115,
   "id": "176cce7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv rel-phyla-table.qza rel-phyla-table.zip\n",
    "# !unzip rel-phyla-table.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0d942b9",
   "metadata": {},
   "source": [
    "<h4>Rename folder of unzipped files<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "id": "455c8b19",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv d89a1249-2655-44f9-adda-ab1f4a7ae78b rel-phyla-table"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17063ec6",
   "metadata": {},
   "source": [
    "<h4>Convert biom file to a .tsv<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "id": "b5411148",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !biom convert --to-tsv -i rel-phyla-table/data/feature-table.biom -o rel-phyla-table.tsv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd554730",
   "metadata": {},
   "source": [
    "<h4>Converted rel-phyla-table.tsv to rel-phyla-table.csv for use in R<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8e66c3",
   "metadata": {},
   "source": [
    "<h4>Removed \"Construction from .biom file\" line at the top of rel-phyla-table.csv<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2ea4804",
   "metadata": {},
   "source": [
    "<h4>Also removed hash tag from OTU column title in rel-phyla-table.csv<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea9fa852",
   "metadata": {},
   "source": [
    "<h3>Stacked barplot generation with microbiome data<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0494f675",
   "metadata": {},
   "source": [
    "<h3>https://www.youtube.com/watch?v=siIoupAnILk<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25b28b56",
   "metadata": {},
   "source": [
    "<h2>Load tidyverse and ggplot2 packages<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "id": "8ec56015",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(tidyverse)\n",
    "library(tidyr)\n",
    "library(ggplot2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06893afe",
   "metadata": {},
   "source": [
    "<h3>Load asv table at phylum level<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "id": "3e2c421c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/rel-phyla-table.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1535a7a",
   "metadata": {},
   "source": [
    "<h3>Make data into a data frame<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "id": "ac97d2fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data=data.frame(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f28a137f",
   "metadata": {},
   "source": [
    "<h3>Change periods in sample names to dashes for later matching with metadata columns<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "id": "879cbadf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "names(data) <- gsub(\".\", \"-\", names(data), fixed=TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d73b112",
   "metadata": {},
   "source": [
    "<h3>Transpose data frame so samples are in rows<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "id": "892dba15",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- data%>%\n",
    "  pivot_longer(cols=c(-1),names_to=\"index\")%>%\n",
    "  pivot_wider(names_from=c(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c1da35a",
   "metadata": {},
   "source": [
    "<h3>Rename columns with phyla to be phylum name alone<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "id": "6d4bb23d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- data %>%\n",
    "  rename(\n",
    "    Firmicutes = `d__Bacteria;p__Firmicutes`,\n",
    "    Thermoplasmatota = `d__Archaea;p__Thermoplasmatota`,\n",
    "    Bacteroidota = `d__Bacteria;p__Bacteroidota`,\n",
    "    Verrucomicrobiota = `d__Bacteria;p__Verrucomicrobiota`,\n",
    "    Myxococcota = `d__Bacteria;p__Myxococcota`,\n",
    "    Spirochaetota = `d__Bacteria;p__Spirochaetota`,\n",
    "    Armatimonadota = `d__Bacteria;p__Armatimonadota`,\n",
    "    Proteobacteria = `d__Bacteria;p__Proteobacteria`,\n",
    "    Halobacterota = `d__Archaea;p__Halobacterota`,\n",
    "    Patescibacteria = `d__Bacteria;p__Patescibacteria`,\n",
    "    Euryarchaeota = `d__Archaea;p__Euryarchaeota`,\n",
    "    Fibrobacterota = `d__Bacteria;p__Fibrobacterota`,\n",
    "    Bdellovibrionota = `d__Bacteria;p__Bdellovibrionota`,\n",
    "    Cyanobacteria = `d__Bacteria;p__Cyanobacteria`,\n",
    "    Desulfobacterota = `d__Bacteria;p__Desulfobacterota`,\n",
    "    Planctomycetota = `d__Bacteria;p__Planctomycetota`,\n",
    "    Synergistota = `d__Bacteria;p__Synergistota`,\n",
    "    Elusimicrobiota = `d__Bacteria;p__Elusimicrobiota`,\n",
    "    SAR324_clade.Marine_group_B = `d__Bacteria;p__SAR324_clade(Marine_group_B)`,\n",
    "    Chloroflexi = `d__Bacteria;p__Chloroflexi`,\n",
    "    Crenarchaeota = `d__Archaea;p__Crenarchaeota`,\n",
    "    Fusobacteriota = `d__Bacteria;p__Fusobacteriota`,\n",
    "    Actinobacteriota = `d__Bacteria;p__Actinobacteriota`,\n",
    "    Methylomirabilota = `d__Bacteria;p__Methylomirabilota`,\n",
    "    Parabasalia = `d__Eukaryota;p__Parabasalia`,\n",
    "    Deinococcota = `d__Bacteria;p__Deinococcota`,\n",
    "    Campilobacterota = `d__Bacteria;p__Campilobacterota`,\n",
    "    Sumerlaeota = `d__Bacteria;p__Sumerlaeota`,\n",
    "    Gemmatimonadota = `d__Bacteria;p__Gemmatimonadota`,\n",
    "    WPS.2 = `d__Bacteria;p__WPS-2`,\n",
    "    Deferribacterota = `d__Bacteria;p__Deferribacterota`,\n",
    "    NB1.j = `d__Bacteria;p__NB1-j`,\n",
    "    Acidobacteriota = `d__Bacteria;p__Acidobacteriota` \n",
    "  )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4267f8ab",
   "metadata": {},
   "source": [
    "<h3>Move matrix into a long format for ggplot2<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "id": "bbd9093d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <-data%>%\n",
    "  pivot_longer(-index, names_to = \"Phylum\", values_to = \"Relative_Frequency\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1584153",
   "metadata": {},
   "source": [
    "<h3>Get other relevant metadata columns for plotting to rejoin with new long format data frame. Use level-2.csv file (see below for code using just this file)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "id": "b72b1e9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "meta_cols <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/level-2.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d9f651e",
   "metadata": {},
   "source": [
    "<h3>Make into data frame<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "id": "605fd884",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "meta_cols <- as.data.frame(meta_cols)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b0bfcba",
   "metadata": {},
   "source": [
    "<h3>Get rid of all columns except for those needed to plot<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 127,
   "id": "507bab1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "meta_cols <- select(meta_cols, c(index,Elephant_ID,Family,Date.Sampled,AgeClass,Sex,Season,Age.at.Sampling))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e81cc790",
   "metadata": {},
   "source": [
    "<h3>Read Date.Sampled in as a date<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e26a8044",
   "metadata": {},
   "source": [
    "<h4>Load lubridate, a package that tells R how to interpret dates easily<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 128,
   "id": "087a01a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(lubridate) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1367fb16",
   "metadata": {},
   "source": [
    "<h4>Show R that Date.Sampled is in month/day/year format<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 129,
   "id": "8796823c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "meta_cols <- meta_cols %>% \n",
    "  mutate(Date.Sampled = lubridate::mdy(Date.Sampled))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e77f7975",
   "metadata": {},
   "source": [
    "<h3>Add wanted metadata columns into the long format data frame<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 130,
   "id": "eada8a36",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- data %>%\n",
    "  left_join(meta_cols, by = \"index\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df03c77d",
   "metadata": {},
   "source": [
    "<h3>Reorder taxa based on relative abundance, using \"taxa-bar-plots-for-stacked-figure.qzv\" within jupyter notebook (qiime2 output)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 131,
   "id": "f89ebb4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "phyla_order <- c(\"Firmicutes\",\n",
    "                 \"Bacteroidota\",\n",
    "                 \"Euryarchaeota\",\n",
    "                 \"Actinobacteriota\",\n",
    "                 \"Proteobacteria\",\n",
    "                 \"Halobacterota\",\n",
    "                 \"Verrucomicrobiota\",\n",
    "                 \"Planctomycetota\",\n",
    "                 \"Spirochaetota\",\n",
    "                 \"Armatimonadota\",\n",
    "                 \"Synergistota\",\n",
    "                 \"Chloroflexi\",\n",
    "                 \"Fibrobacterota\",\n",
    "                 \"Desulfobacterota\",\n",
    "                 \"Thermoplasmatota\",\n",
    "                 \"Cyanobacteria\",\n",
    "                 \"Patescibacteria\",\n",
    "                 \"Bdellovibrionota\",\n",
    "                 \"Elusimicrobiota\",\n",
    "                 \"WPS.2\",\n",
    "                 \"Myxococcota\",\n",
    "                 \"SAR324_clade.Marine_group_B\",\n",
    "                 \"Parabasalia\",\n",
    "                 \"Fusobacteriota\",\n",
    "                 \"Acidobacteriota\",\n",
    "                 \"Campilobacterota\",\n",
    "                 \"Deferribacterota\",\n",
    "                 \"Gemmatimonadota\",\n",
    "                 \"Crenarchaeota\",\n",
    "                 \"Deinococcota\",\n",
    "                 \"Methylomirabilota\",\n",
    "                 \"NB1.j\",\n",
    "                 \"Sumerlaeota\")\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = factor(Phylum, levels=phyla_order))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "deb9380e",
   "metadata": {},
   "source": [
    "<h3>Group less abundant groups together for the plot, found code at https://campus.datacamp.com/courses/categorical-data-in-the-tidyverse/manipulating-factor-variables?ex=8\n",
    "<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "id": "1b0ed324",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = fct_other(Phylum,\n",
    "                            drop = c( \"Chloroflexi\",\n",
    "                                      \"Fibrobacterota\",\n",
    "                                      \"Desulfobacterota\",\n",
    "                                      \"Thermoplasmatota\",\n",
    "                                      \"Cyanobacteria\",\n",
    "                                      \"Patescibacteria\",\n",
    "                                      \"Bdellovibrionota\",\n",
    "                                      \"Elusimicrobiota\",\n",
    "                                      \"WPS.2\",\n",
    "                                      \"Myxococcota\",\n",
    "                                      \"SAR324_clade.Marine_group_B\",\n",
    "                                      \"Parabasalia\",\n",
    "                                      \"Fusobacteriota\",\n",
    "                                      \"Acidobacteriota\",\n",
    "                                      \"Campilobacterota\",\n",
    "                                      \"Deferribacterota\",\n",
    "                                      \"Gemmatimonadota\",\n",
    "                                      \"Crenarchaeota\",\n",
    "                                      \"Deinococcota\",\n",
    "                                      \"Methylomirabilota\",\n",
    "                                      \"NB1.j\",\n",
    "                                      \"Sumerlaeota\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ae5b459",
   "metadata": {},
   "source": [
    "<h3>Load ggh4x package to be able to nest facets so date will be visible<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "id": "62881d8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggh4x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8b0ecf6",
   "metadata": {},
   "source": [
    "<h3>Load forcats package to order facets as desired<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 134,
   "id": "9c69803c",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(forcats)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18f6127d",
   "metadata": {},
   "source": [
    "<h2><font color='purple'>The remaining code used to make the supplementary figure (ggplot2 code that goes through each individual) can be found further along in this Jupyter Notebook: just search \"Below is the remaining code for the stacked bar plot supplementary figure\" to be taken there.<font color='purple'><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3a0cab4",
   "metadata": {},
   "source": [
    "<h2><font color='green'>Are there differences in alpha diversity among samples, and what factors affect alpha diversity?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa6aaf45",
   "metadata": {},
   "source": [
    "<h2>Use q2-breakaway to check for alpha diversity in non-rarefied samples<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de6c60f3",
   "metadata": {},
   "source": [
    "<h2>Output is a richness estimate - see Willis and Bunge 2015 and https://github.com/statdivlab/q2-breakaway<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "id": "2192fc3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime breakaway alpha \\\n",
    "# --i-table filtered_table_4.qza \\\n",
    "# --o-alpha-diversity alpha.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c80c98f",
   "metadata": {},
   "source": [
    "<h2>Create visualization of alpha diversity estimates with error bars<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "id": "9ae7ffa1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime breakaway plot \\\n",
    "# --i-alpha-diversity alpha.qza \\\n",
    "# --o-visualization alpha_plot"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4def075f",
   "metadata": {},
   "source": [
    "<h2>View plot<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "id": "8d8ec1ac",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "viz_alpha = Visualization.load('alpha_plot.qzv')\n",
    "viz_alpha"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c4e1855",
   "metadata": {},
   "source": [
    "<h2>Export alpha diversity into a .tsv file to add to metadata_2 file<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61c57c45",
   "metadata": {},
   "source": [
    "<h3>Note: made new metadata file, metadata_2.csv (.csv for use in R, see below), containing only the samples remaining after quality control steps (317 samples), and added covariates used in the linear regression (see below)<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec6da550",
   "metadata": {},
   "source": [
    "<h3>Also added resulting alpha-diversity values, from the export above in alpha_data -> data folders, as a column to metadata_2<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "id": "fb8ca2de",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv alpha.qza alpha.zip\n",
    "# !unzip alpha.zip\n",
    "\n",
    "# #Rename folder of unzipped files\n",
    "\n",
    "# !mv 7a4f2eac-e38a-4d80-a9a4-9b7bd8cda97e alpha_data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4194bedd",
   "metadata": {},
   "source": [
    "<h2>Behind the scenes steps taken before linear regression:<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cac562e6",
   "metadata": {},
   "source": [
    "<h3>Covariate derivation:<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "392c745c",
   "metadata": {},
   "source": [
    "<h4>NDVI values (mean and standard deviation) extracted from Google Earth for core area drawn from elephant gps collars using the code available in manuscript's additional files<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5b11a41",
   "metadata": {},
   "source": [
    "<h4>Binaring grazing values determined according to NDVI values based on the results of Cerling et al. 2009, see below and manuscript's additional files for further details<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5c221aa",
   "metadata": {},
   "source": [
    "<h4>Livestock average of averages measurement drawn from monthly mammal census counts in the Samburu and Buffalo Springs National Reserves, data and calculations available in manuscript's additional files<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "affa8ceb",
   "metadata": {},
   "source": [
    "<h4>Age and sex data known from long-term monitoring. Most individuals have birthdates estimated with an accuracy of within three weeks. Individuals without decimal values in their age estmate were estimated more coarsely, due to being older than when the study started or having entered the population later, but within three year accuracy and likely more accurately (see Wittemyer et al. 2013, Wittemyer et al. 2021<h4> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60da8337",
   "metadata": {},
   "source": [
    "<h4>Family and individual ID included as random factors. Family later removed due to convergence issues and fact that some of the sampled elephants were no longer with their originally family, so interpretation would be difficult<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abcab194",
   "metadata": {},
   "source": [
    "<h4>Sequencing plate is also included as a control random variable, as samples were divided among four sequencing plates for laboratory analysis<h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf3ca0c5",
   "metadata": {},
   "source": [
    "<h2>Install needed R package for mixed effects linear regression<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6bce9516",
   "metadata": {},
   "source": [
    "<h1><font color='red'>Please note that the analysis below was prior to the first review. Analysis following reviewer suggestions can be found towards the end of this Jupyter Notebook. Find \"Post Review Alpha Diversity Analysis\" to get there.</font><h1>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "id": "0475a5a5",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %R install.packages(\"lme4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08969ed6",
   "metadata": {},
   "source": [
    "<h2>Check correlation between NDVI standard deviation and NDVI mean, to see if should use one or both<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "id": "89a355ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# data <- read.csv('metadata_2.csv', header = TRUE) #import data\n",
    "\n",
    "# cor.test(data$NDVI, data$NDVI_stdDev)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b973f19",
   "metadata": {},
   "source": [
    "<h2>Correlation is 0.98, so will see which produces a model that fits better under lowest AICc, see below<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "id": "da952602",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R \n",
    "\n",
    "# library(lme4)\n",
    "\n",
    "# data <- read.csv('metadata_2.csv', header = TRUE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef76de7a",
   "metadata": {},
   "source": [
    "<h2>Set factors<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "id": "5360f827",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# data$Elephant_ID <- as.factor(data$Elephant_ID)\n",
    "# data$Family <- as.factor(data$Family)\n",
    "# data$Plate <- as.factor(data$Plate)\n",
    "# data$Sex <- as.factor(data$Sex)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5c48f02",
   "metadata": {},
   "source": [
    "<h2>Check data structure<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "id": "a82062bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# str(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc665aba",
   "metadata": {},
   "source": [
    "<h3>Looks good<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "617c7eea",
   "metadata": {},
   "source": [
    "<h2>Transform data into a data frame to avoid potential errors and for easier manipulation<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 144,
   "id": "30991335",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# data <- as.data.frame(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86649d1e",
   "metadata": {},
   "source": [
    "<h2>Check response variable distribution<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 145,
   "id": "86669ccd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# hist(data$Alpha.Diversity)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e40a4a",
   "metadata": {},
   "source": [
    "<h3>Lognormal-like distribution<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "972c06fe",
   "metadata": {},
   "source": [
    "<h2>Since sex may be differentially influential based on age, we will include an interaction factor between the two in our model.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4460d30",
   "metadata": {},
   "source": [
    "<h2>Give NDVI mean and standard deviation were correlated (as mentioned above), we need to assess whether a model with mean or standard deviation has a better fit.<h2> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0d69887",
   "metadata": {},
   "source": [
    "<h2>Start with model using NDVI mean, called model.alpha.1<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 146,
   "id": "ab09c4c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# model.alpha.1 <- lmer(log(Alpha.Diversity) ~ Age.at.Sampling.stdz*Sex + NDVI_stdz + Livestock_avg_of_avgs_stdz + (1|Elephant_ID) + (1|Family) + (1|Plate), data = data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84f0065f",
   "metadata": {},
   "source": [
    "<h3>A singular fit arises from including both individual and family. Will exclude family to see if that helps appropriately reduce complexity, as many many samples are required to detect genetic effects (see Grieneisen et al. 2021, and elephant families move together, so would be hard to tease apart environment versus genetics anyway). Also, due to poaching and drought in the study population, some individuals are no longer with their family and as we do not know environment versus genetic contribution hard to decide which family to include<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 147,
   "id": "16b1d3cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# model.alpha.1 <- lmer(log(Alpha.Diversity) ~ Age.at.Sampling.stdz*Sex + NDVI_stdz + Livestock_avg_of_avgs_stdz + (1|Elephant_ID) + (1|Plate), data = data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74638bdc",
   "metadata": {},
   "source": [
    "<h2>Create a model with NDVI standard deviation, model.alpha.2, to compare AICc's<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 148,
   "id": "cff5ce69",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# model.alpha.2 <- lmer(log(Alpha.Diversity) ~ Age.at.Sampling.stdz*Sex + NDVI_stdDev_stdz + Livestock_avg_of_avgs_stdz + (1|Elephant_ID) + (1|Plate), data = data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af0896ec",
   "metadata": {},
   "source": [
    "<h2>Check AICc values with MuMIn package (may need to install MuMIn package first)<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "id": "e6b0ca22",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(MuMIn)\n",
    "\n",
    "# AICc(model.alpha.1, model.alpha.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6da6f57c",
   "metadata": {},
   "source": [
    "<h2>Pretty much the exact same, but will go with model.alpha.1 because AICc is ever so slightly lower<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b2475a4",
   "metadata": {},
   "source": [
    "<h2> Next code involves derivation of the binary grazing covariate. Additional information available in manuscript's additional files<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b62ed76",
   "metadata": {},
   "source": [
    "<h2>First make NDVI  plot so know where to assign grazing variable at 1 (this plot is included with a derivation table showing the Grazing covariate in manuscript's additional files)<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c6221b1",
   "metadata": {},
   "source": [
    "<h3>Load and view a plot of the .csv for mean NDVI downloaded from GoogleEarthEngine. Also convert to a data frame for data manipulation (see below)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 150,
   "id": "040f84e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- read.csv(\"ee-mean-NDVI-FemaleKernelCore-chart.csv\",header=TRUE)\n",
    "\n",
    "data <- as.data.frame(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37e17c85",
   "metadata": {},
   "source": [
    "<h3>View data<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 151,
   "id": "0cf3a35a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "str(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c81a561",
   "metadata": {},
   "source": [
    "<h3>Need system.time_start to be registered by R as a date instead of a character<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b91afd76",
   "metadata": {},
   "source": [
    "<h4>Load packages to tell R that the system.time_start is a date. Again for here and on out, may need to install packages first<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "id": "85bc8971",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(lubridate)\n",
    "library(tidyverse)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "196e0dc9",
   "metadata": {},
   "source": [
    "<h4>Show R that Date.Sampled is in day-month-year format<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 153,
   "id": "dfe167c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "data <- data %>% \n",
    "  mutate(system.time_start = lubridate::dmy(system.time_start))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6765312",
   "metadata": {},
   "source": [
    "<h3>Load packages for plotting<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 154,
   "id": "bd6d1a58",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(ggrepel)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14e9d2d0",
   "metadata": {},
   "source": [
    "<h3>Plot<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "id": "9f31feba",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot1 <- ggplot(data, aes(system.time_start, Mean.NDVI)) + geom_line() +\n",
    "  geom_point() + \n",
    "  theme_bw() +\n",
    "  scale_x_continuous(breaks = pretty(data$system.time_start, n = 31))\n",
    "\n",
    "plot1 <- plot1 + geom_label_repel(aes(label = Mean.NDVI))\n",
    "\n",
    "plot1 <- plot1 + theme(axis.text.x = element_text(angle = 90, vjust = 1, hjust=1))\n",
    "\n",
    "plot1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "027f28a2",
   "metadata": {},
   "source": [
    "<h2>Again, see supplementary figures of the manuscript for derivation of the binary grazing variable, based on the fact that grazing lags behind NDVI, that is at 1 during peak NDVI and back to 0 at baseline (.24 as per Cerling et al. 2009)<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2a82ee3",
   "metadata": {},
   "source": [
    "<h2>Added this variable as \"Grazing\" (1 a diet composed of significant grass, 0 a diet composed primarily of browse)<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a1b864f",
   "metadata": {},
   "source": [
    "<h2>Check three models, one with Grazing together with NDVI mean, one with only NDVI mean, and one with grazing to see which had lowest AICc<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01775e05",
   "metadata": {},
   "source": [
    "<h3>model.alpha.3 has both grazing and NDVI mean<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64fa22f1",
   "metadata": {},
   "source": [
    "<h4>Reload and reconfigure data so it is the metadata again instead of the NDVI data<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 156,
   "id": "03a1d488",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# data <- read.csv(\"metadata_2.csv\", header=TRUE)\n",
    "\n",
    "# data$Elephant_ID <- as.factor(data$Elephant_ID)\n",
    "# data$Plate <- as.factor(data$Plate)\n",
    "# data$Sex <- as.factor(data$Sex)\n",
    "# data$Grazing <- as.factor(data$Grazing)\n",
    "\n",
    "\n",
    "# model.alpha.3 <- lmer(log(Alpha.Diversity) ~ Age.at.Sampling.stdz*Sex + NDVI_stdz + Grazing + Livestock_avg_of_avgs_stdz + (1|Elephant_ID) + (1|Plate), data = data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef292c49",
   "metadata": {},
   "source": [
    "<h3>model.alpha.4 has just grazing and no NDVI mean<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "id": "dfbd7b43",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# model.alpha.4 <- lmer(log(Alpha.Diversity) ~ Age.at.Sampling.stdz*Sex + Grazing + Livestock_avg_of_avgs_stdz + (1|Elephant_ID) + (1|Plate), data = data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e5f65f",
   "metadata": {},
   "source": [
    "<h3>Check AICc values (model.alpha.1 only has NDVI mean, see further above)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 158,
   "id": "1c08fb41",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(MuMIn)\n",
    "\n",
    "# AICc(model.alpha.1,  model.alpha.3, model.alpha.4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5ef4b67",
   "metadata": {},
   "source": [
    "<h2>The model with only grazing has the lowest AICc score, therefore that is our final model.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5376f23a",
   "metadata": {},
   "source": [
    "<h2>Look at the summary of our final model, model.alpha.4, to see results<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "id": "9ac460d9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# summary(model.alpha.4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e869317e",
   "metadata": {},
   "source": [
    "<h2>Check confidence intervals<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "id": "9c9a8dfb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# confint(model.alpha.4, oldNames = FALSE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bad9e49",
   "metadata": {},
   "source": [
    "<h2>Livestock has an estimated negative correlation with alpha diversity, and its confidence interval does not overlap zero.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bea7a6fa",
   "metadata": {},
   "source": [
    "<h3>Specifically, a one unit increase in livestock numbers corresponds to a ~12% (95% confidence interval (CI) 4% - 20%) decrease in alpha diversity.<h3> "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b99c1b9",
   "metadata": {},
   "source": [
    "<h2>Grazing also has an estimated negative correlation with alpha diversity, and its CI does not overlap zero.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "385c0520",
   "metadata": {},
   "source": [
    "<h3>Having a diet consisting largely of grass was associated with an estimated ~18% (95% CI <1% - 32%)decrease in alpha diversity.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24ffac30",
   "metadata": {},
   "source": [
    "<h2>Look at random intercept values according to Elephant ID to aide interpretation<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "id": "4afad574",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# coef(model.alpha.4)$Elephant_ID"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2a1637f",
   "metadata": {},
   "source": [
    "<h2>Look at random intercept values according to Plate to aide interpretation<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 162,
   "id": "0b49963e",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# coef(model.alpha.4)$Plate"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57760fed",
   "metadata": {},
   "source": [
    "<h2>In the manuscript, the results from the linear regression looking at alpha diversity are depicted in Table 2.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71b22fd1",
   "metadata": {},
   "source": [
    "<h2>Check model fit - for these checks, we used code from https://rpubs.com/eointravers/xv-lmm-residuals<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06bb71ad",
   "metadata": {},
   "source": [
    "<h3>Load tidyverse for mutation of data<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 163,
   "id": "c49f2140",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(tidyverse)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b9c82a3",
   "metadata": {},
   "source": [
    "<h3>Make new columns in the data for diagnostic values (see below)<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6264d385",
   "metadata": {},
   "source": [
    "<h4>Residuals and squared errors<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "id": "78d00a2e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# data = mutate(data,\n",
    "#                     prediction = fitted(model.alpha.4),\n",
    "#                     resid = log(Alpha.Diversity) - prediction,\n",
    "#                     resid2 = resid^2) \n",
    "\n",
    "# plot(model.alpha.4)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d6e2987",
   "metadata": {},
   "source": [
    "<h3>Continuing diagnostics<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "id": "811885da",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot = function(data, xvar, yvar, jitter = .2){\n",
    "#   ggplot(data, aes({{xvar}}, {{yvar}})) +\n",
    "#     geom_point(position = position_jitter(width = jitter),\n",
    "#                alpha = .5) +\n",
    "#     stat_smooth(method = 'loess', formula = y ~ x)\n",
    "# }\n",
    "# smoothplot(data, prediction, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a90ea455",
   "metadata": {},
   "source": [
    "<h3>Above plot not perfect, but acceptable<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f97df25a",
   "metadata": {},
   "source": [
    "<h3>Check whether random effects (rfx) are valid overall<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 166,
   "id": "3726285a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# rfx = data.frame(ranef(model.alpha.4))\n",
    "# hist(rfx$condval)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "980d84c8",
   "metadata": {},
   "source": [
    "<h3>Centered around zero, so that is good. Relatively homogeneous variance<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0b3b2db",
   "metadata": {},
   "source": [
    "<h3>Check livestock variable<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "id": "360ee2f4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Livestock_avg_of_avgs_stdz, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac86cad8",
   "metadata": {},
   "source": [
    "<h3>Looks okay considering the space between points. Residuals seem acceptable. Fit very good up until the really high livestock values<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e86589a7",
   "metadata": {},
   "source": [
    "<h3>Livestock squared residuals<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "id": "cd05c149",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Livestock_avg_of_avgs_stdz, resid2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4bb73b6",
   "metadata": {},
   "source": [
    "<h3>Looks useable<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9045a0ed",
   "metadata": {},
   "source": [
    "<h3>Visualize residuals in a new way<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "id": "b547ce36",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# ggplot(data, aes(Livestock_avg_of_avgs_stdz, resid2)) + \n",
    "#   stat_summary() + labs(y = 'Mean Squared Error')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f90d556",
   "metadata": {},
   "source": [
    "<h3>Less accurate for high values of livestock, but still looks useable<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d507b78f",
   "metadata": {},
   "source": [
    "<h3>Check individual residuals<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "id": "b9647748",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Elephant_ID, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d57dc61",
   "metadata": {},
   "source": [
    "<h3>Looks very good<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0e0f7c2",
   "metadata": {},
   "source": [
    "<h3>Check individual squared residuals<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "id": "764cd364",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Elephant_ID, resid2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9770c4af",
   "metadata": {},
   "source": [
    "<h3>Acceptable<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "583be95f",
   "metadata": {},
   "source": [
    "<h3>Another visualization of individual squared residuals<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "id": "8aef3ed7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# ggplot(data, aes(Elephant_ID, resid2)) + \n",
    "#   stat_summary() + labs(y = 'Mean Squared Error')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a13000ff",
   "metadata": {},
   "source": [
    "<h3>Looks good, although 12 elephants have missing values because elephants with one sample have no \"mean\" error surrounding their residuals.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eaa9a963",
   "metadata": {},
   "source": [
    "<h3>Grazing also showed an effect, check its fit<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "id": "bb54def7",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Grazing, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f42b0a8f",
   "metadata": {},
   "source": [
    "<h3>Good<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8df655f7",
   "metadata": {},
   "source": [
    "<h3>Check grazing squared residuals<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "id": "8495c37b",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Grazing, resid2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad6c4ec2",
   "metadata": {},
   "source": [
    "<h3>Looks fine<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1529a7b",
   "metadata": {},
   "source": [
    "<h3>View with error<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "id": "a4d5e3ed",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# ggplot(data, aes(Grazing, resid2)) + \n",
    "#   stat_summary() + labs(y = 'Mean Squared Error') "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63eb6883",
   "metadata": {},
   "source": [
    "<h3>Check other variables quickly, with the residuals smooth plot<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "765006d3",
   "metadata": {},
   "source": [
    "<h3>Age check<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "id": "b27fd4c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Age.at.Sampling.stdz, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc70c2cd",
   "metadata": {},
   "source": [
    "<h3>Fine<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b11cdaa9",
   "metadata": {},
   "source": [
    "<h3>Check the fit of sex variable<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "id": "856e3b36",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# smoothplot(data, Sex, resid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6178a20",
   "metadata": {},
   "source": [
    "<h3>Good<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee57917e",
   "metadata": {},
   "source": [
    "<h2>Conclusion: model fit is acceptable<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "390dd13d",
   "metadata": {},
   "source": [
    "<h2>Figure 3A for alpha diversity and livestock results - this is a figure no longer used after the first review<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "id": "5dd717ef",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(ggplot2)\n",
    "\n",
    "# alpha_means <- setNames(\n",
    "#   aggregate(Alpha.Diversity ~ Livestock_avg_of_avgs, data, mean),\n",
    "#   c(\"Livestock_avg_of_avgs\", \"Mean_alpha_diversity\")\n",
    "# )\n",
    "\n",
    "# alpha_medians <- setNames(\n",
    "#   aggregate(Alpha.Diversity ~ Livestock_avg_of_avgs, data, median),\n",
    "#   c(\"Livestock_avg_of_avgs\", \"Median_alpha_diversity\")\n",
    "# )\n",
    "\n",
    "# fig3a <- ggplot(data, aes(Livestock_avg_of_avgs, Alpha.Diversity)) + geom_point(colour = \"grey\") + geom_smooth(method = \"lm\", colour=\"blue\") + \n",
    "#   theme_bw() +\n",
    "#   theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +\n",
    "#   labs(y= \"Alpha diversity\", x = \"Average livestock census count\") +\n",
    "#   theme(axis.text = element_text( \n",
    "#     size=14, \n",
    "#     face=3)) +\n",
    "#   theme(axis.title.x = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.y = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.x = element_text(vjust=-1)) +\n",
    "#   theme(axis.title.y = element_text(vjust=1.5)) +\n",
    "#   theme(axis.line = element_line(linewidth = 1, colour = \"black\")) +\n",
    "#   theme(plot.margin = margin(1,1,1,1, \"cm\"))\n",
    "\n",
    "# fig3a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f978dd9f",
   "metadata": {},
   "source": [
    "<h2>Figure 3B (zoomed in on Figure 3A line) - this is a figure no longer used after the first review<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "id": "410f8406",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# fig3b <- ggplot(data, aes(Livestock_avg_of_avgs, Alpha.Diversity)) + geom_point(colour = \"grey\") + geom_smooth(method = \"lm\", colour=\"blue\") + \n",
    "#   theme_bw() +\n",
    "#   theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +\n",
    "#   labs(y= \"Alpha diversity\", x = \"Average livestock census count\") +\n",
    "#   theme(axis.text = element_text( \n",
    "#     size=14, \n",
    "#     face=3)) +\n",
    "#   theme(axis.title.x = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.y = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.x = element_text(vjust=-1)) +\n",
    "#   theme(axis.title.y = element_text(vjust=2)) +\n",
    "#   theme(axis.line = element_line(linewidth = 1, colour = \"black\")) +\n",
    "#   coord_cartesian(ylim=c(0,500)) +\n",
    "#   theme(plot.margin = margin(1,1,1,1, \"cm\")) +\n",
    "#   theme(plot.margin = margin(1,1,1,1, \"cm\"))\n",
    "\n",
    "\n",
    "# fig3b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd947393",
   "metadata": {},
   "source": [
    "<h3>Figure 4A for alpha diversity and grazing results - this is a figure no longer used after the first review<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "id": "37bf30cf",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# fig4a <- ggplot(data, aes(Grazing, Alpha.Diversity, fill=Grazing)) + geom_boxplot(outlier.colour=\"black\", outlier.shape=16,\n",
    "#              outlier.size=2, notch=TRUE) + \n",
    "#   theme_bw() +\n",
    "#   theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +\n",
    "#   labs(y= \"Alpha diversity\", x = \"Diet\") +\n",
    "#   theme(axis.text = element_text( \n",
    "#     size=14, \n",
    "#     face=3)) +\n",
    "#   theme(axis.title.x = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.y = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.x = element_text(vjust=-1)) +\n",
    "#   theme(axis.title.y = element_text(vjust=1.5)) +\n",
    "#   scale_x_discrete(labels=c('mostly browsing', 'substantial grass')) +\n",
    "#   theme(axis.line = element_line(linewidth = 1, colour = \"black\")) +\n",
    "#   theme(plot.margin = margin(1,1,1,1, \"cm\")) +\n",
    "#   theme(legend.position=\"none\") +\n",
    "#   scale_fill_manual(values=c(\"lightblue4\",\"lightblue\"))\n",
    "\n",
    "\n",
    "# fig4a"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3904e2d",
   "metadata": {},
   "source": [
    "<h3>Figure 4B (Zoomed in on Figure 4A) - this is a figure no longer used after the first review<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "id": "f770ae7b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# fig4b <- ggplot(data, aes(Grazing, Alpha.Diversity, fill=Grazing)) + geom_boxplot(outlier.colour=\"black\", outlier.shape=16,\n",
    "#              outlier.size=2, notch=TRUE) + \n",
    "#   theme_bw() +\n",
    "#   theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +\n",
    "#   labs(y= \"Alpha diversity\", x = \"Diet\") +\n",
    "#   theme(axis.text = element_text( \n",
    "#     size=14, \n",
    "#     face=3)) +\n",
    "#   theme(axis.title.x = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.y = element_text(face=\"bold\", size=16)) +\n",
    "#   theme(axis.title.x = element_text(vjust=-1)) +\n",
    "#   theme(axis.title.y = element_text(vjust=1.5)) +\n",
    "#   scale_x_discrete(labels=c('mostly browse', 'substantial grass')) +\n",
    "#   theme(axis.line = element_line(linewidth = 1, colour = \"black\")) +\n",
    "#   theme(plot.margin = margin(1,1,1,1, \"cm\")) +\n",
    "#   coord_cartesian(ylim=c(260,280)) +\n",
    "#   theme(legend.position=\"none\") +\n",
    "#   scale_fill_manual(values=c(\"lightblue4\",\"lightblue\")) \n",
    "# #   geom_hline(yintercept = 264.58, color=\"purple\") +\n",
    "# # geom_hline(yintercept = 271.65, color=\"purple\")\n",
    "\n",
    "# fig4b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4c7416b",
   "metadata": {},
   "source": [
    "<h2><font color='green'>Are there differences in beta diversity among samples, and what factors affect beta diversity?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b01d54c4",
   "metadata": {},
   "source": [
    "<h2>Now, we will use deicode to calculate beta diversity for non-rarefied samples (see Martino et al. 2019)<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0197192",
   "metadata": {},
   "source": [
    "<h2>Output are ordination and distance files calculated through robust Aitchinson PCA analysis<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "id": "15cc2db9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime deicode rpca \\\n",
    "#     --i-table filtered_table_4.qza \\\n",
    "#     --o-biplot ordination.qza \\\n",
    "#     --o-distance-matrix distance.qza"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9696fa9a",
   "metadata": {},
   "source": [
    "<h1><font color='red'>The below export and analysis were completed prior to the first manuscript review. To find the new analysis with respect to beta diversity (which still uses the DEICODE output), please search \"New Beta Analysis\", which will take you to the new analysis near the end of this Jupyter Notebook file</font><h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b75bc0a",
   "metadata": {},
   "source": [
    "<h3>Unzip .qza files for export<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "236abcfe",
   "metadata": {},
   "source": [
    "<h4>First ordination file<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "id": "28d62356",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv ordination.qza ordination.zip\n",
    "# !unzip ordination.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e36edbd6",
   "metadata": {},
   "source": [
    "<h4>Rename randomly created output for unzipped ordination folder (this can also be done manually)<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "id": "d69540bd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv f735e3c3-1083-45b9-b7a9-c7498e9444eb ordination"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "596acd1e",
   "metadata": {},
   "source": [
    "<h4>Then distance file<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "id": "717e8fa2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv distance.qza distance.zip\n",
    "# !unzip distance.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9b8ac3a",
   "metadata": {},
   "source": [
    "<h4>Rename randomly created output for unzipped distance folder<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "id": "70690b0e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !mv 2a050123-22ce-4d6e-9109-9ed6ef5c5942 distance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a118ee6a",
   "metadata": {},
   "source": [
    "<h3>The distance-matix.tsv in distance -> data was exported to an Excel file into the Manuscript_1 folder (titled \"distance-matrix\") to be used in PrimerE PERMANOVA+ analysis software, which can accurately account for individual ID as a random effect in uneven longitudinal sampling designs. The metadata_2.csv file was also exported to an excel file, titled metadata_2.xls. Further details on that analysis can be found in the manuscrpt's additional files.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1c5291a",
   "metadata": {},
   "source": [
    "<h2><font color='green'>Given the GAMM results (see further below - this was with respect to a different analysis before the first manuscript review, but was retained with respect to the new results), can we see what is changing with respect to livestock abundance or age using q2-longitudinal?</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "637d4720",
   "metadata": {},
   "source": [
    "<h2>Check feature volatility with respect to livestock abundance<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 187,
   "id": "e1c0b007",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime longitudinal feature-volatility \\\n",
    "#   --i-table genus_collapsed_table.qza  \\\n",
    "#   --m-metadata-file metadata_3.tsv \\\n",
    "#   --p-state-column Livestock_avg_of_avgs \\\n",
    "# --p-individual-id-column Elephant_ID \\\n",
    "# --p-feature-count 10 \\\n",
    "# --output-dir feature-volatility-livestock"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b231a4a1",
   "metadata": {},
   "source": [
    "<h2>View feature volatility plot with respect to livestock<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 188,
   "id": "4dee3d0d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "viz_fvol_lstk = Visualization.load('feature-volatility-livestock/volatility_plot.qzv')\n",
    "viz_fvol_lstk"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ade05fbd",
   "metadata": {},
   "source": [
    "<h2>Repeat with respect to age \n",
    "    - these age q2-longitudinal analysis were run before the first review and are no longer relevant <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 189,
   "id": "8e8e2710",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime longitudinal feature-volatility \\\n",
    "#   --i-table genus_collapsed_table.qza  \\\n",
    "#   --m-metadata-file metadata_3.tsv \\\n",
    "#   --p-state-column Age.at.Sampling \\\n",
    "# --p-individual-id-column Elephant_ID \\\n",
    "# --p-feature-count 10 \\\n",
    "# --output-dir feature-volatility-age"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e354ef92",
   "metadata": {},
   "source": [
    "<h2>View feature volatility plot with respect to age<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 190,
   "id": "422c69a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# viz_fvol_age = Visualization.load('feature-volatility-age/volatility_plot.qzv')\n",
    "# viz_fvol_age"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b04270f6",
   "metadata": {},
   "source": [
    "<h3>Nothing is showing up on the age plot, therefore I will move up to order according to the following post in hopes of simplifying the data: https://forum.qiime2.org/t/feature-volatility-plot-not-showing-in-qiime-2-view/22472 <h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 191,
   "id": "878886da",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !qiime longitudinal feature-volatility \\\n",
    "#   --i-table order_collapsed_table.qza  \\\n",
    "#   --m-metadata-file metadata_3.tsv \\\n",
    "#   --p-state-column Age.at.Sampling \\\n",
    "# --p-individual-id-column Elephant_ID \\\n",
    "# --p-feature-count 10 \\\n",
    "# --output-dir feature-volatility-age-order"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29528eb7",
   "metadata": {},
   "source": [
    "<h2>View feature volatility plot with respect to age at the order level<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 192,
   "id": "eca97faa",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# viz_fvol_age_order = Visualization.load('feature-volatility-age-order/volatility_plot.qzv')\n",
    "# viz_fvol_age_order"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1475c6c0",
   "metadata": {},
   "source": [
    "<h2>The visualization is still overwhelmed. I will create a new metadata file, metadata_age_yrs, with a column of age to the nearest year to make the x-axis simpler, then repeat with respect to genus.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 193,
   "id": "480cc51e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !qiime longitudinal feature-volatility \\\n",
    "#   --i-table genus_collapsed_table.qza  \\\n",
    "#   --m-metadata-file metadata_age_yrs.tsv \\\n",
    "#   --p-state-column Age.at.Sampling.year \\\n",
    "# --p-individual-id-column Elephant_ID \\\n",
    "# --p-feature-count 10 \\\n",
    "# --output-dir feature-volatility-age-yrs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0072b0e4",
   "metadata": {},
   "source": [
    "<h2>View feature volatility plot with respect to age collapsed to years (genus level again)<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 194,
   "id": "afe37793",
   "metadata": {},
   "outputs": [],
   "source": [
    "# viz_fvol_age_yrs = Visualization.load('feature-volatility-age-yrs/volatility_plot.qzv')\n",
    "# viz_fvol_age_yrs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02e8f269",
   "metadata": {},
   "source": [
    "<h2>Still overwhelmed, moving on<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5a1c250",
   "metadata": {},
   "source": [
    "<h2><font color='purple'>Below is the remaining code for the stacked bar plot supplementary figure</font><h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05ec7007",
   "metadata": {},
   "source": [
    "<h2>First we reloaded the data as it was further up when prepping the supplementary figure.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 195,
   "id": "660a0f77",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/rel-phyla-table.csv\")\n",
    "data=data.frame(data)\n",
    "names(data) <- gsub(\".\", \"-\", names(data), fixed=TRUE)\n",
    "data <- data%>%\n",
    "  pivot_longer(cols=c(-1),names_to=\"index\")%>%\n",
    "  pivot_wider(names_from=c(1))\n",
    "data <- data %>%\n",
    "  rename(\n",
    "    Firmicutes = `d__Bacteria;p__Firmicutes`,\n",
    "    Thermoplasmatota = `d__Archaea;p__Thermoplasmatota`,\n",
    "    Bacteroidota = `d__Bacteria;p__Bacteroidota`,\n",
    "    Verrucomicrobiota = `d__Bacteria;p__Verrucomicrobiota`,\n",
    "    Myxococcota = `d__Bacteria;p__Myxococcota`,\n",
    "    Spirochaetota = `d__Bacteria;p__Spirochaetota`,\n",
    "    Armatimonadota = `d__Bacteria;p__Armatimonadota`,\n",
    "    Proteobacteria = `d__Bacteria;p__Proteobacteria`,\n",
    "    Halobacterota = `d__Archaea;p__Halobacterota`,\n",
    "    Patescibacteria = `d__Bacteria;p__Patescibacteria`,\n",
    "    Euryarchaeota = `d__Archaea;p__Euryarchaeota`,\n",
    "    Fibrobacterota = `d__Bacteria;p__Fibrobacterota`,\n",
    "    Bdellovibrionota = `d__Bacteria;p__Bdellovibrionota`,\n",
    "    Cyanobacteria = `d__Bacteria;p__Cyanobacteria`,\n",
    "    Desulfobacterota = `d__Bacteria;p__Desulfobacterota`,\n",
    "    Planctomycetota = `d__Bacteria;p__Planctomycetota`,\n",
    "    Synergistota = `d__Bacteria;p__Synergistota`,\n",
    "    Elusimicrobiota = `d__Bacteria;p__Elusimicrobiota`,\n",
    "    SAR324_clade.Marine_group_B = `d__Bacteria;p__SAR324_clade(Marine_group_B)`,\n",
    "    Chloroflexi = `d__Bacteria;p__Chloroflexi`,\n",
    "    Crenarchaeota = `d__Archaea;p__Crenarchaeota`,\n",
    "    Fusobacteriota = `d__Bacteria;p__Fusobacteriota`,\n",
    "    Actinobacteriota = `d__Bacteria;p__Actinobacteriota`,\n",
    "    Methylomirabilota = `d__Bacteria;p__Methylomirabilota`,\n",
    "    Parabasalia = `d__Eukaryota;p__Parabasalia`,\n",
    "    Deinococcota = `d__Bacteria;p__Deinococcota`,\n",
    "    Campilobacterota = `d__Bacteria;p__Campilobacterota`,\n",
    "    Sumerlaeota = `d__Bacteria;p__Sumerlaeota`,\n",
    "    Gemmatimonadota = `d__Bacteria;p__Gemmatimonadota`,\n",
    "    WPS.2 = `d__Bacteria;p__WPS-2`,\n",
    "    Deferribacterota = `d__Bacteria;p__Deferribacterota`,\n",
    "    NB1.j = `d__Bacteria;p__NB1-j`,\n",
    "    Acidobacteriota = `d__Bacteria;p__Acidobacteriota` \n",
    "  )\n",
    "data <-data%>%\n",
    "  pivot_longer(-index, names_to = \"Phylum\", values_to = \"Relative_Frequency\")\n",
    "\n",
    "meta_cols <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/level-2.csv\")\n",
    "meta_cols <- as.data.frame(meta_cols)\n",
    "meta_cols <- select(meta_cols, c(index,Elephant_ID,Family,Date.Sampled,AgeClass,Sex,Season,Age.at.Sampling))\n",
    "library(lubridate)\n",
    "meta_cols <- meta_cols %>% \n",
    "  mutate(Date.Sampled = lubridate::mdy(Date.Sampled))\n",
    "data <- data %>%\n",
    "  left_join(meta_cols, by = \"index\")\n",
    "phyla_order <- c(\"Firmicutes\",\n",
    "                 \"Bacteroidota\",\n",
    "                 \"Euryarchaeota\",\n",
    "                 \"Actinobacteriota\",\n",
    "                 \"Proteobacteria\",\n",
    "                 \"Halobacterota\",\n",
    "                 \"Verrucomicrobiota\",\n",
    "                 \"Planctomycetota\",\n",
    "                 \"Spirochaetota\",\n",
    "                 \"Armatimonadota\",\n",
    "                 \"Synergistota\",\n",
    "                 \"Chloroflexi\",\n",
    "                 \"Fibrobacterota\",\n",
    "                 \"Desulfobacterota\",\n",
    "                 \"Thermoplasmatota\",\n",
    "                 \"Cyanobacteria\",\n",
    "                 \"Patescibacteria\",\n",
    "                 \"Bdellovibrionota\",\n",
    "                 \"Elusimicrobiota\",\n",
    "                 \"WPS.2\",\n",
    "                 \"Myxococcota\",\n",
    "                 \"SAR324_clade.Marine_group_B\",\n",
    "                 \"Parabasalia\",\n",
    "                 \"Fusobacteriota\",\n",
    "                 \"Acidobacteriota\",\n",
    "                 \"Campilobacterota\",\n",
    "                 \"Deferribacterota\",\n",
    "                 \"Gemmatimonadota\",\n",
    "                 \"Crenarchaeota\",\n",
    "                 \"Deinococcota\",\n",
    "                 \"Methylomirabilota\",\n",
    "                 \"NB1.j\",\n",
    "                 \"Sumerlaeota\")\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = factor(Phylum, levels=phyla_order))\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = fct_other(Phylum,\n",
    "                            drop = c( \"Chloroflexi\",\n",
    "                                      \"Fibrobacterota\",\n",
    "                                      \"Desulfobacterota\",\n",
    "                                      \"Thermoplasmatota\",\n",
    "                                      \"Cyanobacteria\",\n",
    "                                      \"Patescibacteria\",\n",
    "                                      \"Bdellovibrionota\",\n",
    "                                      \"Elusimicrobiota\",\n",
    "                                      \"WPS.2\",\n",
    "                                      \"Myxococcota\",\n",
    "                                      \"SAR324_clade.Marine_group_B\",\n",
    "                                      \"Parabasalia\",\n",
    "                                      \"Fusobacteriota\",\n",
    "                                      \"Acidobacteriota\",\n",
    "                                      \"Campilobacterota\",\n",
    "                                      \"Deferribacterota\",\n",
    "                                      \"Gemmatimonadota\",\n",
    "                                      \"Crenarchaeota\",\n",
    "                                      \"Deinococcota\",\n",
    "                                      \"Methylomirabilota\",\n",
    "                                      \"NB1.j\",\n",
    "                                      \"Sumerlaeota\")))\n",
    "\n",
    "library(ggh4x)\n",
    "library(forcats)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4b2bc5f",
   "metadata": {},
   "source": [
    "<h3>These plots were copied from R and gathered into the supplementary figure in Word, with one legend per page. Blue codes correspond to individual elephants<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f515db24",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1205</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 196,
   "id": "cf743ebb",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1205 <- filter(data, Elephant_ID == \"B1205\")\n",
    "\n",
    "p_b1205 <- ggplot(b1205, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        panel.spacing = unit(c(0,0),\"lines\"),\n",
    "        strip.text.x = element_text(face = \"bold\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1205  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b68ca0d",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1207</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 197,
   "id": "88c45704",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1207 <- filter(data, Elephant_ID == \"B1207\")\n",
    "\n",
    "p_b1207 <- ggplot(b1207, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1207  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a21fd847",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1211</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 198,
   "id": "52d74dbe",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1211 <- filter(data, Elephant_ID == \"B1211\")\n",
    "\n",
    "p_b1211 <- ggplot(b1211, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1211  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13d66316",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1216</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 199,
   "id": "5b8968d9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1216 <- filter(data, Elephant_ID == \"B1216\")\n",
    "\n",
    "p_b1216 <- ggplot(b1216, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette = \"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1216"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "263dd895",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1285</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 200,
   "id": "4593d471",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1285 <- filter(data, Elephant_ID == \"B1285\")\n",
    "\n",
    "p_b1285 <- ggplot(b1285, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette = \"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1285\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "621fe139",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>B1330</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 201,
   "id": "e40a7b0c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "b1330 <- filter(data, Elephant_ID == \"B1330\")\n",
    "\n",
    "p_b1330 <- ggplot(b1330, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_b1330"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b15322a",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M2.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 202,
   "id": "301a60c0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m2.03 <- filter(data, Elephant_ID == \"M2.03\")\n",
    "\n",
    "p_m2.03 <- ggplot(m2.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(c(0,0,0),\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m2.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63a947b4",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M4.02</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 203,
   "id": "edac1e64",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m4.02 <- filter(data, Elephant_ID == \"M4.02\")\n",
    "\n",
    "p_m4.02 <- ggplot(m4.02, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m4.02\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dedfa313",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M26.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 204,
   "id": "5eb500be",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m6.04 <- filter(data, Elephant_ID == \"M6.04\")\n",
    "\n",
    "p_m6.04 <- ggplot(m6.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m6.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60bab1cd",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M6.99</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 205,
   "id": "e03bb1de",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m6.99 <- filter(data, Elephant_ID == \"M6.99\")\n",
    "\n",
    "p_m6.99 <- ggplot(m6.99, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m6.99"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2be0a58e",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M6.9913</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 206,
   "id": "1a776f9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m6.9913 <- filter(data, Elephant_ID == \"M6.9913\")\n",
    "\n",
    "p_m6.9913 <- ggplot(m6.9913, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m6.9913"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc00b105",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M7.8905</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 207,
   "id": "38ad9ad4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m7.8905 <- filter(data, Elephant_ID == \"M7.8905\")\n",
    "\n",
    "p_m7.8905 <- ggplot(m7.8905, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m7.8905"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cff8d05",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M9.02</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 208,
   "id": "7379cb2b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m9.02 <- filter(data, Elephant_ID == \"M9.02\")\n",
    "\n",
    "p_m9.02 <- ggplot(m9.02, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m9.02"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "480a1117",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M24.9004</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "id": "a4ba9d51",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#M24.9004\n",
    "\n",
    "m24.9004 <- filter(data, Elephant_ID == \"M24.9004\")\n",
    "\n",
    "p_m24.9004 <- ggplot(m24.9004, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m24.9004\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf4253d3",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M25.00</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 210,
   "id": "aa358e79",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m25.00 <- filter(data, Elephant_ID == \"M25.00\")\n",
    "\n",
    "p_m25.00 <- ggplot(m25.00, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m25.00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6bf3613",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M25.0012</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 211,
   "id": "01797e37",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m25.0012 <- filter(data, Elephant_ID == \"M25.0012\")\n",
    "\n",
    "p_m25.0012 <- ggplot(m25.0012, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m25.0012"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6e846cd",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M26.05</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 212,
   "id": "0beabd44",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m26.05 <- filter(data, Elephant_ID == \"M26.05\")\n",
    "\n",
    "p_m26.05 <- ggplot(m26.05, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m26.05"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa1c8b2f",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M31.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 213,
   "id": "63f84305",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m31.04 <- filter(data, Elephant_ID == \"M31.04\")\n",
    "\n",
    "p_m31.04 <- ggplot(m31.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m31.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65a49499",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M32.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 214,
   "id": "8332253d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m32.04 <- filter(data, Elephant_ID == \"M32.04\")\n",
    "\n",
    "p_m32.04 <- ggplot(m32.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m32.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "592c874e",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M45.01</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 215,
   "id": "9a7600cb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m45.01 <- filter(data, Elephant_ID == \"M45.01\")\n",
    "\n",
    "p_m45.01 <- ggplot(m45.01, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m45.01"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08cb1bb7",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M45.0115</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 216,
   "id": "c13b30cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m45.0115 <- filter(data, Elephant_ID == \"M45.0115\")\n",
    "\n",
    "p_m45.0115 <- ggplot(m45.0115, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m45.0115"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c08a7135",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M53.95B</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 217,
   "id": "a2fe07cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m53.95b <- filter(data, Elephant_ID == \"M53.95B\")\n",
    "\n",
    "p_m53.95b <- ggplot(m53.95b, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m53.95b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2663c612",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M54.00</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 218,
   "id": "202ba3d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m54.00 <- filter(data, Elephant_ID == \"M54.00\")\n",
    "\n",
    "p_m54.00 <- ggplot(m54.00, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m54.00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "597955a7",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M63.01</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 219,
   "id": "47a97b10",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m63.01 <- filter(data, Elephant_ID == \"M63.01\")\n",
    "\n",
    "p_m63.01 <- ggplot(m63.01, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m63.01"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f402866",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M63.0113</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 220,
   "id": "4bad5d59",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m63.0113 <- filter(data, Elephant_ID == \"M63.0113\")\n",
    "\n",
    "p_m63.0113 <- ggplot(m63.0113, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m63.0113"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b11218fb",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M63.94</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 221,
   "id": "4adb4dad",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m63.94 <- filter(data, Elephant_ID == \"M63.94\")\n",
    "\n",
    "p_m63.94 <- ggplot(m63.94, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m63.94"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c52c55a9",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M63.9410</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 222,
   "id": "ad52bef1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m63.9410 <- filter(data, Elephant_ID == \"M63.9410\")\n",
    "\n",
    "p_m63.9410 <- ggplot(m63.9410, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m63.9410"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cacb2141",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M64.95</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 223,
   "id": "c1fe36e3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m64.95 <- filter(data, Elephant_ID == \"M64.95\")\n",
    "\n",
    "p_m64.95 <- ggplot(m64.95, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m64.95"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16a57eca",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M64.9512</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 224,
   "id": "32be76f5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m64.9512 <- filter(data, Elephant_ID == \"M64.9512\")\n",
    "\n",
    "p_m64.9512 <- ggplot(m64.9512, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m64.9512"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1abb799",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>M65.9305</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 225,
   "id": "1aab43bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "m65.9305 <- filter(data, Elephant_ID == \"M65.9305\")\n",
    "\n",
    "p_m65.9305 <- ggplot(m65.9305, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_m65.9305"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd2619a7",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R4.00</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 226,
   "id": "026866c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r4.00 <- filter(data, Elephant_ID == \"R4.00\")\n",
    "\n",
    "p_r4.00 <- ggplot(r4.00, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r4.00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab18ae26",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R7.9203</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 227,
   "id": "02dd29cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r7.9203 <- filter(data, Elephant_ID == \"R7.9203\")\n",
    "\n",
    "p_r7.9203 <- ggplot(r7.9203, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r7.9203"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b955fe12",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R8.00</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 228,
   "id": "17bc2406",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r8.00 <- filter(data, Elephant_ID == \"R8.00\")\n",
    "\n",
    "p_r8.00 <- ggplot(r8.00, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r8.00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ae5f5e1",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R8.0013</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 229,
   "id": "cd69f23d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r8.0013 <- filter(data, Elephant_ID == \"R8.0013\")\n",
    "\n",
    "p_r8.0013 <- ggplot(r8.0013, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r8.0013"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96fabd2f",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R17.08</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 230,
   "id": "3f1527bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r17.08 <- filter(data, Elephant_ID == \"R17.08\")\n",
    "\n",
    "p_r17.08 <- ggplot(r17.08, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r17.08"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0316575",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R17.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 231,
   "id": "3bcf7ab8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r17.04 <- filter(data, Elephant_ID == \"R17.04\")\n",
    "\n",
    "p_r17.04 <- ggplot(r17.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r17.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "443d1d8d",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R19.8801</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 232,
   "id": "d0ba8b6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r19.8801 <- filter(data, Elephant_ID == \"R19.8801\")\n",
    "\n",
    "p_r19.8801 <- ggplot(r19.8801, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r19.8801"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f82e7acf",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R19.880114</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 233,
   "id": "22d576de",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r19.880114 <- filter(data, Elephant_ID == \"R19.880114\")\n",
    "\n",
    "p_r19.880114 <- ggplot(r19.880114, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r19.880114"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "990480fc",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R21.08</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 234,
   "id": "1b68ef3d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r21.08 <- filter(data, Elephant_ID == \"R21.08\")\n",
    "\n",
    "p_r21.08 <- ggplot(r21.08, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r21.08"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "552d2d31",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R21.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 235,
   "id": "96de58d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r21.04 <- filter(data, Elephant_ID == \"R21.04\")\n",
    "\n",
    "p_r21.04 <- ggplot(r21.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r21.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c0d032f",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R22.06</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 236,
   "id": "f975e17e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r22.06 <- filter(data, Elephant_ID == \"R22.06\")\n",
    "\n",
    "p_r22.06 <- ggplot(r22.06, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r22.06"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6964656",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R22.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "id": "8879ba55",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r22.03 <- filter(data, Elephant_ID == \"R22.03\")\n",
    "\n",
    "p_r22.03 <- ggplot(r22.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r22.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d2f9a87",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R22.8904</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "id": "e7e1b7f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r22.8904 <- filter(data, Elephant_ID == \"R22.8904\")\n",
    "\n",
    "p_r22.8904 <- ggplot(r22.8904, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r22.8904"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27cf3bf1",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R24.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "id": "c7f2e6c7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r24.04 <- filter(data, Elephant_ID == \"R24.04\")\n",
    "\n",
    "p_r24.04 <- ggplot(r24.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r24.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62a792a3",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R25.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "id": "5ea65b09",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r25.03 <- filter(data, Elephant_ID == \"R25.03\")\n",
    "\n",
    "p_r25.03 <- ggplot(r25.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r25.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "551499e5",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R25.9002</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 241,
   "id": "df51e642",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r25.9002 <- filter(data, Elephant_ID == \"R25.9002\")\n",
    "\n",
    "p_r25.9002 <- ggplot(r25.9002, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r25.9002"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "646948c8",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R25.9613</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "id": "a5c59565",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r25.9613 <- filter(data, Elephant_ID == \"R25.9613\")\n",
    "\n",
    "p_r25.9613 <- ggplot(r25.9613, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r25.9613"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3d85161",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R26.00</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 243,
   "id": "33970b6e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r26.00 <- filter(data, Elephant_ID == \"R26.00\")\n",
    "\n",
    "p_r26.00 <- ggplot(r26.00, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r26.00"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3902b6e",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R26.0013</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 244,
   "id": "8c5bcb3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r26.0013 <- filter(data, Elephant_ID == \"R26.0013\")\n",
    "\n",
    "p_r26.0013 <- ggplot(r26.0013, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r26.0013"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7e978c8",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R27.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "id": "6f1624d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r27.04 <- filter(data, Elephant_ID == \"R27.04\")\n",
    "\n",
    "p_r27.04 <- ggplot(r27.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r27.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1af2d52",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R27.8904</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 246,
   "id": "0bd1e14d",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r27.8904 <- filter(data, Elephant_ID == \"R27.8904\")\n",
    "\n",
    "p_r27.8904 <- ggplot(r27.8904, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r27.8904"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8202d692",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R28.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 247,
   "id": "032f203b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r28.03 <- filter(data, Elephant_ID == \"R28.03\")\n",
    "\n",
    "p_r28.03 <- ggplot(r28.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r28.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e645e732",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R28.0315</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 248,
   "id": "d37055dc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r28.0315 <- filter(data, Elephant_ID == \"R28.0315\")\n",
    "\n",
    "p_r28.0315 <- ggplot(r28.0315, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r28.0315"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "abd834b1",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R28.99</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 249,
   "id": "db3d5731",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r28.99 <- filter(data, Elephant_ID == \"R28.99\")\n",
    "\n",
    "p_r28.99 <- ggplot(r28.99, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r28.99"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb288894",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R28.9912</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 250,
   "id": "14cc6abf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r28.9912 <- filter(data, Elephant_ID == \"R28.9912\")\n",
    "\n",
    "p_r28.9912 <- ggplot(r28.9912, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r28.9912"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e78483b9",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R29.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 251,
   "id": "1dec2dfe",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r29.03 <- filter(data, Elephant_ID == \"R29.03\")\n",
    "\n",
    "p_r29.03 <- ggplot(r29.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r29.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cef7f887",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R36.06</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "id": "765b7bf0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r36.06 <- filter(data, Elephant_ID == \"R36.06\")\n",
    "\n",
    "p_r36.06 <- ggplot(r36.06, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r36.06"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4759b758",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R36.03</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 253,
   "id": "ef3a6325",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r36.03 <- filter(data, Elephant_ID == \"R36.03\")\n",
    "\n",
    "p_r36.03 <- ggplot(r36.03, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r36.03"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "722f2851",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R37.04</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 254,
   "id": "ced91b82",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r37.04 <- filter(data, Elephant_ID == \"R37.04\")\n",
    "\n",
    "p_r37.04 <- ggplot(r37.04, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r37.04"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7fa49a9",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R37.9107</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 255,
   "id": "c213079f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r37.9107 <- filter(data, Elephant_ID == \"R37.9107\")\n",
    "\n",
    "p_r37.9107 <- ggplot(r37.9107, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r37.9107"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "422a0d48",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>R38.01</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 256,
   "id": "77b96c62",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "r38.01 <- filter(data, Elephant_ID == \"R38.01\")\n",
    "\n",
    "p_r38.01 <- ggplot(r38.01, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_r38.01"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6e0463b",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>S92.06</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 257,
   "id": "c8037f51",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "s92.06 <- filter(data, Elephant_ID == \"S92.06\")\n",
    "\n",
    "p_s92.06 <- ggplot(s92.06, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_s92.06"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05df16e5",
   "metadata": {},
   "source": [
    "<h4><font color='blue'>S92.06B</font><h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 258,
   "id": "d1fb77a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "s92.06b <- filter(data, Elephant_ID == \"Zodiacs.B\")\n",
    "\n",
    "p_s92.06b <- ggplot(s92.06b, aes(x=index, y=Relative_Frequency, width = 1)) +\n",
    "  facet_nested(~Elephant_ID + Sex + Age.at.Sampling + Date.Sampled, scales = \"free_x\", space = \"free_x\") +\n",
    "  geom_bar(aes(fill=Phylum), stat = \"identity\", position = \"fill\") +\n",
    "  scale_fill_brewer(palette=\"Paired\") +\n",
    "  scale_y_continuous(name = \"Relative abundance\",\n",
    "                     labels = scales::percent,\n",
    "                     expand = c(0,0)) +\n",
    "  theme_bw() +\n",
    "  theme(axis.title.x=element_blank(),\n",
    "        axis.text.x=element_blank(),\n",
    "        axis.ticks.x=element_blank(),\n",
    "        panel.border = element_blank(),\n",
    "        panel.background = element_blank(),\n",
    "        strip.background = element_rect(color = \"black\"),\n",
    "        strip.text.x = element_text( face = \"bold\"),\n",
    "        panel.spacing = unit(0,\"lines\")) +\n",
    "  force_panelsizes(cols = unit(1.75, \"cm\"))\n",
    "\n",
    "p_s92.06b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6490afbf",
   "metadata": {},
   "source": [
    "<h1><font color='red'>Post Review Alpha Diversity Analysis: The below revised analysis and additional elements are in response to reviewer comments following a  review of the original manuscript. </font><h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82e4720b",
   "metadata": {},
   "source": [
    "<h2>Reanalysis was completed to address several concerns about the original analysis. The first concerned data transformation. The response variable of estimated alpha diversity is not transformed in the below analysis. The second concern was that autocorrelation due to time was not accounted for. This new analysis that accounts for that autocorrelation. Time is also added as a general covariate. As pointed out by a reviewer, this is more appropriate for assessing an influence of time than using q2-longitudinal given the length of the study.\n",
    "    \n",
    "Before arriving at the final model, we compared models with NDVI mean and grazing, NDVI mean alone, and grazing alone as before.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 259,
   "id": "8350bec7",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Install brms, nlme, bayesplot, and loo packages for the new Bayesian analysis\n",
    "\n",
    "# install.packages('brms')\n",
    "\n",
    "# install.packages('nlme')\n",
    "\n",
    "# install.packages('bayesplot')\n",
    "\n",
    "#install.packages('loo')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d57a4372",
   "metadata": {},
   "source": [
    "<h3>Start with a model that includes both NDVI mean and grazing (the \"everything\" model).<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 260,
   "id": "3b37e1eb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "data <- read.csv(\"metadata_2.csv\", header=TRUE)\n",
    "\n",
    "library(ggplot2)\n",
    "library(dplyr)\n",
    "library(brms)\n",
    "library(nlme)\n",
    "library(bayesplot)\n",
    "library(loo)\n",
    "\n",
    "data$Plate <- as.factor(data$Plate)\n",
    "data$Sex <- as.factor(data$Sex)\n",
    "data$Elephant_ID <- as.factor(data$Elephant_ID)\n",
    "data$Grazing <- as.factor(data$Grazing)\n",
    "\n",
    "data <- as.data.frame(data)\n",
    "\n",
    "# hist(data$Alpha.Diversity) #lognormal distribution\n",
    "\n",
    "data$Date.Sampled <- as.Date(data$Date.Sampled,format=\"%m/%d/%y\")\n",
    "data$Time <- as.numeric(data$Date.Sampled)\n",
    "data$Time_stdz <- scale(data$Time)\n",
    "data <- data[order(data$Elephant_ID, data$Time), ]\n",
    "\n",
    "# str(data)\n",
    "\n",
    "# #Define weakly informative priors\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,0,5), class = \"Intercept\"), #intercept heavy-tailed but centered at zero (Gelman et al. 2008)\n",
    "#   prior(normal(0,20), class = \"b\"), #Fixed effects: allow moderate-sized effects while helping convergence (Gelman 2013)\n",
    "#   prior(exponential(1), class = \"sd\"), #random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "#   prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "# )\n",
    "\n",
    "# #Model with everything (NDVI mean and grazing)\n",
    "\n",
    "# model_brms_everything <- brm(\n",
    "#   bf(Alpha.Diversity ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        NDVI_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = data,\n",
    "#   family = lognormal(link = \"identity\"),  # No need to specify the 'log' link as 'identity' is default\n",
    "#   prior = priors,\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "#For model comparison: Pareto-smoothed importance sampling leave one out (PSIS-LOO) (Vehtari, Gelman, & Gabry (2017),Vehtari A, Gabry J, Magnusson M, Yao Y, Bürkner P, Paananen T, Gelman A (2024))\n",
    "model_brms_everything_before_priors_adjusted <- readRDS(\"model_brms_everything_before_priors_adjusted.rds\")\n",
    "\n",
    "loo_result_everything_before_priors_adjusted <- loo(model_brms_everything_before_priors_adjusted)\n",
    "print(loo_result_everything_before_priors_adjusted)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7308991f",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the above model after initial run.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 261,
   "id": "c6f0fba8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "#Saved model after running\n",
    "\n",
    "# saveRDS(model_brms_everything, \"model_brms_everything_before_priors_adjusted.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bd26d0d",
   "metadata": {},
   "source": [
    "<h3>Quickly take a back step to check which prior would be most appropriate given the distribution of alpha diversity values. Normal(0,20) is likely a good option to be relatively vague but still aide convergence. Normal(0,100) might be better because we have no prior knowledge.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 262,
   "id": "ce5b82f6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "brm(\n",
    "  bf(Alpha.Diversity ~\n",
    "       Sex * Age.at.Sampling.stdz +\n",
    "       Livestock_avg_of_avgs_stdz +\n",
    "       NDVI_stdz +\n",
    "       Grazing +\n",
    "       Time_stdz + \n",
    "       s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "       (1 | Elephant_ID) +\n",
    "       (1 | Plate)\n",
    "     ),\n",
    "  data = data,\n",
    "  family = lognormal(link = \"identity\"),\n",
    "    priors <- c(\n",
    "  prior(student_t(3,0,5), class = \"Intercept\"), #intercept heavy-tailed but centered at zero (Gelman et al. 2008)\n",
    "  prior(normal(0,20), class = \"b\"), #Fixed effects: allow moderate-sized effects while helping convergence (Gelman 2013)\n",
    "  prior(exponential(1), class = \"sd\"), #random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "  prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "),\n",
    "  sample_prior=\"only\", #This line tells brms to ignore the data\n",
    "  chains = 4,\n",
    "  cores = 4,\n",
    "  iter = 8000,\n",
    "  warmup = 4000,\n",
    "  control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e190921",
   "metadata": {},
   "source": [
    "<h3>Looking at the 95% CI's, this is still not allowing for the possibility of large (and therefore more biologically meaningful) effects. I will try Normal(0,100) as a prior for fixed effects. I will also explicitly define a prior for the spline error at exponential(1), instead of allowing brms to use its default t-distribution. Finally, the intercept might be best centered around 350, roughly the mean of the alpha diversity values. Therefore I will use student_t(3, 350, 100).<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "id": "c84db627",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "brm(\n",
    "  bf(Alpha.Diversity ~\n",
    "       Sex * Age.at.Sampling.stdz +\n",
    "       Livestock_avg_of_avgs_stdz +\n",
    "       NDVI_stdz +\n",
    "       Grazing +\n",
    "       Time_stdz + \n",
    "       s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "       (1 | Elephant_ID) +\n",
    "       (1 | Plate)\n",
    "     ),\n",
    "  data = data,\n",
    "  family = lognormal(link = \"identity\"),\n",
    "    priors <- c(\n",
    "  prior(student_t(3,350,100), class = \"Intercept\"), #intercept heavy-tailed and centered around mean (Gelman et al. 2008)\n",
    "  prior(normal(0,100), class = \"b\"), #Fixed effects: allow large effects but centered at zero, as we have no prior knowledge (Gelman 2013)\n",
    "  prior(exponential(1), class = \"sd\"),#random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "  prior(exponential(1), class = \"sds\"), #prior for spline error (Burkner 2017, Simpson et al. 2017)\n",
    "  prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "),\n",
    "  sample_prior=\"only\", #This line tells brms to ignore the data\n",
    "  chains = 4,\n",
    "  cores = 4,\n",
    "  iter = 8000,\n",
    "  warmup = 4000,\n",
    "  control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7276813b",
   "metadata": {},
   "source": [
    "<h3>These values look how we would like them to, so we will continue on with them instead.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b581a8d6",
   "metadata": {},
   "source": [
    "<h4>Below is the model with everything, redone to reflect the new priors.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 264,
   "id": "709f018f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# #Define more weakly informative priors that better fit the scale of the data\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,350,100), class = \"Intercept\"), #intercept heavy-tailed but centered at zero (Gelman et al. 2008)\n",
    "#   prior(normal(0,100), class = \"b\"), #Fixed effects: allow moderate-sized effects while helping convergence (Gelman 2013)\n",
    "#   prior(exponential(1), class = \"sd\"), #random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "#   prior(exponential(1), class = \"sds\"), #prior for spline error (Burkner 2017, Simpson et al. 2017)\n",
    "#   prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "# )\n",
    "\n",
    "# #Model with everything (NDVI mean and grazing)\n",
    "\n",
    "# model_brms_everything <- brm(\n",
    "#   bf(Alpha.Diversity ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        NDVI_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = data,\n",
    "#   family = lognormal(link = \"identity\"),  # No need to specify the 'log' link as 'identity' is default\n",
    "#   prior = priors,\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "# #For model comparison: Pareto-smoothed importance sampling leave one out (PSIS-LOO) (Vehtari, Gelman, & Gabry (2017),Vehtari A, Gabry J, Magnusson M, Yao Y, Bürkner P, Paananen T, Gelman A (2024))\n",
    "\n",
    "model_brms_everything <- readRDS(\"model_brms_everything.rds\")\n",
    "\n",
    "loo_result_everything <- loo(model_brms_everything)\n",
    "print(loo_result_everything)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d74d258b",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the above model after initial run.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 265,
   "id": "23688f2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# #Save model\n",
    "\n",
    "# saveRDS(model_brms_everything, \"model_brms_everything.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f9cd506",
   "metadata": {},
   "source": [
    "<h3>Model with just NDVI mean<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 266,
   "id": "bc6f9f76",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# #Define weakly informative priors that better fit the scale of the data\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,350,100), class = \"Intercept\"), #intercept heavy-tailed but centered at zero (Gelman et al. 2008)\n",
    "#   prior(normal(0,100), class = \"b\"), #Fixed effects: allow moderate-sized effects while helping convergence (Gelman 2013)\n",
    "#   prior(exponential(1), class = \"sd\"), #random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "#   prior(exponential(1), class = \"sds\"), #prior for spline error (Burkner 2017, Simpson et al. 2017)\n",
    "#   prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "# )\n",
    "\n",
    "# #Model with only NDVI mean\n",
    "\n",
    "# model_brms_NDVI_mean <- brm(\n",
    "#   bf(Alpha.Diversity ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        NDVI_stdz +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = data,\n",
    "#   family = lognormal(link = \"identity\"),  # No need to specify the 'log' link as 'identity' is default\n",
    "#   prior = priors,\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "#For model comparison: Pareto-smoothed importance sampling leave one out (PSIS-LOO) (Vehtari, Gelman, & Gabry (2017),Vehtari A, Gabry J, Magnusson M, Yao Y, Bürkner P, Paananen T, Gelman A (2024))\n",
    "\n",
    "model_brms_NDVI_mean <- readRDS(\"model_brms_NDVI_mean.rds\")\n",
    "\n",
    "loo_result_NDVI_mean <- loo(model_brms_NDVI_mean)\n",
    "print(loo_result_NDVI_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ec2ccd4",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the above model after initial run.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 267,
   "id": "958eb5b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# #Save model\n",
    "\n",
    "# saveRDS(model_brms_NDVI_mean, \"model_brms_NDVI_mean.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d9445a8",
   "metadata": {},
   "source": [
    "<h3>Model with just grazing<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "id": "3382d39a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# #Define weakly informative priors that better fit the scale of the data\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,350,100), class = \"Intercept\"), #intercept heavy-tailed but centered at zero (Gelman et al. 2008)\n",
    "#   prior(normal(0,100), class = \"b\"), #Fixed effects: allow moderate-sized effects while helping convergence (Gelman 2013)\n",
    "#   prior(exponential(1), class = \"sd\"), #random intercept and slope SD's: concentrate near zero but allow a wide range (Gelman 2006)\n",
    "#   prior(exponential(1), class = \"sds\"), #prior for spline error (Burkner 2017, Simpson et al. 2017)\n",
    "#   prior(exponential(1), class = \"sigma\") #Residual SD: same rationale (Gelman 2006)\n",
    "# )\n",
    "\n",
    "# #Model with only grazing\n",
    "\n",
    "# model_brms_Grazing <- brm(\n",
    "#   bf(Alpha.Diversity ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = data,\n",
    "#   family = lognormal(link = \"identity\"),  # No need to specify the 'log' link as 'identity' is default\n",
    "#   prior = priors,\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "#For model comparison: Pareto-smoothed importance sampling leave one out (PSIS-LOO) (Vehtari, Gelman, & Gabry (2017),Vehtari A, Gabry J, Magnusson M, Yao Y, Bürkner P, Paananen T, Gelman A (2024))\n",
    "\n",
    "model_brms_Grazing <- readRDS(\"model_brms_Grazing.rds\")\n",
    "\n",
    "loo_result_Grazing <- loo(model_brms_Grazing)\n",
    "print(loo_result_Grazing)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99ff1b8e",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the above model after initial run.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 269,
   "id": "8965df8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "#%%R\n",
    "\n",
    "#Save model\n",
    "\n",
    "# saveRDS(model_brms_Grazing, \"model_brms_Grazing.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c93fc04",
   "metadata": {},
   "source": [
    "<h3>Direct comparison of the three models<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 270,
   "id": "e598c948",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "loo_compare(loo_result_everything, loo_result_NDVI_mean, loo_result_Grazing)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55b4521e",
   "metadata": {},
   "source": [
    "<h3>All close, so results for all are below, but the grazing model is the one we will use for the MS.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4e576a0",
   "metadata": {},
   "source": [
    "<h2> View summary output of all models <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 271,
   "id": "9dadd91c",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "summary(model_brms_everything)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28e027fc",
   "metadata": {},
   "source": [
    "<h3>Looking at this and the below results, it seems NDVI mean and grazing steal variation from one another and mask each others' effects. This is likely due to correlation.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 272,
   "id": "053c2de3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "summary(model_brms_NDVI_mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9887690b",
   "metadata": {},
   "source": [
    "<h2>The grazing and NDVI models give essentially the same output, suggesting that differences according to NDVI are likely due to the diet switch.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4da6c022",
   "metadata": {},
   "source": [
    "<h4>Below are the results we present in the MS.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 273,
   "id": "f1abfe52",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "summary(model_brms_Grazing) ##full summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec7546f5",
   "metadata": {},
   "source": [
    "<h4>Posterior predictive checks<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "id": "7c77e1af",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "id": "f85ee7d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing, type = \"hist\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 276,
   "id": "74e7a6b5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing, type = \"ecdf_overlay\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 277,
   "id": "83d8b59e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing, type = \"boxplot\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 278,
   "id": "d37d71ed",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing, type = \"intervals\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 279,
   "id": "20e5aec4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(model_brms_Grazing, type = \"scatter_avg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 280,
   "id": "0978cfa0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#List parameter names to use in quick checks code below\n",
    "posterior_array <- as.array(model_brms_Grazing)\n",
    "dimnames(posterior_array)[[3]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "538fe620",
   "metadata": {},
   "source": [
    "<h2>Chain convergence checks and outcome for each parameter<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbca0cc7",
   "metadata": {},
   "source": [
    "<h3>FIXED EFFECTS<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8bf199eb",
   "metadata": {},
   "source": [
    "<h4>Age<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 281,
   "id": "681ad5b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# age_trace <- mcmc_trace(posterior_array, pars = c(\"b_Age.at.Sampling.stdz\"))\n",
    "# age_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_Age.at.Sampling.stdz\"))\n",
    "# age_hist <- mcmc_hist(posterior_array, pars = \"b_Age.at.Sampling.stdz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b507a4a",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 282,
   "id": "b4836ee3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(age_trace, age_overlay, age_hist, file = \"age_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "id": "f714e9e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"age_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "id": "8dbc6d98",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(age_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb5c1adc",
   "metadata": {},
   "source": [
    "<h4> Chain mixing looks good. <h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 285,
   "id": "1674b72a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(age_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c27d449d",
   "metadata": {},
   "source": [
    "<h4>Good<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 286,
   "id": "ddd6f212",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(age_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80f3eb91",
   "metadata": {},
   "source": [
    "<h3><font color='green'>No apparent effect of age on alpha diversity </font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55a1811b",
   "metadata": {},
   "source": [
    "<h4>Grazing<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 287,
   "id": "af1287da",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# grazing_trace <- mcmc_trace(posterior_array, pars = c(\"b_Grazing1\"))\n",
    "# grazing_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_Grazing1\"))\n",
    "# grazing_hist <- mcmc_hist(posterior_array, pars = \"b_Grazing1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54465d9d",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "id": "8412d874",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(grazing_trace, grazing_overlay, grazing_hist, file = \"grazing_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 289,
   "id": "0d258dae",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"grazing_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 290,
   "id": "27a2bc68",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(grazing_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7060440c",
   "metadata": {},
   "source": [
    "<h4>Chain mixing looks good.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 291,
   "id": "103c44ef",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(grazing_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dec99243",
   "metadata": {},
   "source": [
    "<h4>Acceptable<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 292,
   "id": "211495a5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(grazing_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1adb12b",
   "metadata": {},
   "source": [
    "<h3><font color='green'>Grazing appears to have a negative effect on alpha diversity. </font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fef6ad62",
   "metadata": {},
   "source": [
    "<h4>Sex<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "id": "49ef2117",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# sex_trace <- mcmc_trace(posterior_array, pars = c(\"b_SexM\"))\n",
    "# sex_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_SexM\"))\n",
    "# sex_hist <- mcmc_hist(posterior_array, pars = \"b_SexM\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee8540bc",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "id": "2649a47e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(sex_trace, sex_overlay, sex_hist, file = \"sex_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "id": "1155ebcd",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"sex_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 296,
   "id": "df592d5a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "982afce6",
   "metadata": {},
   "source": [
    "<h4>Chain mixing looks good.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 297,
   "id": "bd805fdf",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35f0b938",
   "metadata": {},
   "source": [
    "<h4>Looks good.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 298,
   "id": "81c7a800",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1ca49eb",
   "metadata": {},
   "source": [
    "<h3><font color='green'>No apparent effect of sex on alpha diversity </font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce871d0a",
   "metadata": {},
   "source": [
    "<h4>Sex and age interaction<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 299,
   "id": "26b3e1e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# sex_age_trace <- mcmc_trace(posterior_array, pars = c(\"b_SexM:Age.at.Sampling.stdz\"))\n",
    "# sex_age_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_SexM:Age.at.Sampling.stdz\"))\n",
    "# sex_age_hist <- mcmc_hist(posterior_array, pars = \"b_SexM:Age.at.Sampling.stdz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "032fc6fd",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 300,
   "id": "497a9499",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(sex_age_trace, sex_age_overlay, sex_age_hist, file = \"sex_age_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 301,
   "id": "7036efe6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"sex_age_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 302,
   "id": "4a5ab5d3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_age_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47c7e552",
   "metadata": {},
   "source": [
    "<h4>Chains mixed well.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 303,
   "id": "716a7dda",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_age_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3914f3c",
   "metadata": {},
   "source": [
    "<h4>Reasonable<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 304,
   "id": "b7d98b87",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sex_age_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f53c9c3",
   "metadata": {},
   "source": [
    "<h3><font color='green'>No apparent interaction between sex and age with respect of alpha diversity </font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeb9aa65",
   "metadata": {},
   "source": [
    "<h4>Livestock<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 305,
   "id": "fcdac738",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# livestock_trace <- mcmc_trace(posterior_array, pars = c(\"b_Livestock_avg_of_avgs_stdz\"))\n",
    "# livestock_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_Livestock_avg_of_avgs_stdz\"))\n",
    "# livestock_hist <- mcmc_hist(posterior_array, pars = \"b_Livestock_avg_of_avgs_stdz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e2545df",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 306,
   "id": "ba11b387",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(livestock_trace, livestock_overlay, livestock_hist, file = \"livestock_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 307,
   "id": "6d7839cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"livestock_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 308,
   "id": "1d88fae0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(livestock_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1cb2e8d",
   "metadata": {},
   "source": [
    "<h4>Good mixing<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 309,
   "id": "57568fdf",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(livestock_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f24c0f93",
   "metadata": {},
   "source": [
    "<h4>Looks good<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 310,
   "id": "9743b449",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(livestock_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74cd19e3",
   "metadata": {},
   "source": [
    "<h3><font color='green'>It appears that more livestock in the reserves correlates with lowered alpha diversity </font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31a70541",
   "metadata": {},
   "source": [
    "<h4>Time<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 311,
   "id": "a90ec360",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# time_trace <- mcmc_trace(posterior_array, pars = c(\"b_Time_stdz\"))\n",
    "# time_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_Time_stdz\"))\n",
    "# time_hist <- mcmc_hist(posterior_array, pars = \"b_Time_stdz\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b069207",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 312,
   "id": "82768eed",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(time_trace, time_overlay, time_hist, file = \"time_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 313,
   "id": "f5b12f36",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"time_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 314,
   "id": "68636507",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(time_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21a1e492",
   "metadata": {},
   "source": [
    "<h4>Chain mixing looks good.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 315,
   "id": "82c03769",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(time_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de32db5c",
   "metadata": {},
   "source": [
    "<h4> Looks good<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 316,
   "id": "f2fbc2bc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(time_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62bdab67",
   "metadata": {},
   "source": [
    "<h3><font color='green'>It appears alpha diversity increased over the sampling period.</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c18e7a60",
   "metadata": {},
   "source": [
    "<h4>Population-level intercept<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 317,
   "id": "685a088a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# fixed_intercept_trace <- mcmc_trace(posterior_array, pars = c(\"b_Intercept\"))\n",
    "# fixed_intercept_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"b_Intercept\"))\n",
    "# fixed_intercept_hist <- mcmc_hist(posterior_array, pars = \"b_Intercept\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "547e1a5d",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 318,
   "id": "1d59fbe2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(fixed_intercept_trace, fixed_intercept_overlay, fixed_intercept_hist, file = \"fixed_intercept_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 319,
   "id": "9bc60a41",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"fixed_intercept_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 320,
   "id": "bc115066",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(fixed_intercept_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3b508e3",
   "metadata": {},
   "source": [
    "<h4>Mixing not as good, but as to be expected for a population-level intercept.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 321,
   "id": "613522af",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(fixed_intercept_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cce9ed0",
   "metadata": {},
   "source": [
    "<h4>Looks great<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 322,
   "id": "b47220c7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(fixed_intercept_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef7cd35b",
   "metadata": {},
   "source": [
    "<h3><font color = 'green'>The population intercept is estimated to be between 5 and 6, closer to 6, with a rather small standard deviation.</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e26b04a",
   "metadata": {},
   "source": [
    "<h3>RANDOM EFFECTS<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1aa48288",
   "metadata": {},
   "source": [
    "<h4>Standard deviation for the elephant ID intercept<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 323,
   "id": "9ee4ed00",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# EleID_sd_trace <- mcmc_trace(posterior_array, pars = c(\"sd_Elephant_ID__Intercept\"))\n",
    "# EleID_sd_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"sd_Elephant_ID__Intercept\"))\n",
    "# EleID_sd_hist <- mcmc_hist(posterior_array, pars = \"sd_Elephant_ID__Intercept\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc0fd097",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 324,
   "id": "1f5f3cc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(EleID_sd_trace, EleID_sd_overlay, EleID_sd_hist, file = \"EleID_sd_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 325,
   "id": "551b7a23",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"EleID_sd_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 326,
   "id": "74dbf800",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(EleID_sd_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee485db",
   "metadata": {},
   "source": [
    "<h4>Mixing appears okay<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 327,
   "id": "deca0c82",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(EleID_sd_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8534cc80",
   "metadata": {},
   "source": [
    "<h4>Looks good<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 328,
   "id": "af85b9df",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(EleID_sd_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "093cb236",
   "metadata": {},
   "source": [
    "<h3>It is hard to tell from the histogram whether elephant ID has a discernable effect on alpha diversity. Therefore, we will directly extract the 95% CI for the standard deviation of the elephant ID intercept<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 329,
   "id": "35ae3498",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "posterior_interval(model_brms_Grazing, prob = 0.95)[grep(\"sd_Elephant_ID__Intercept\", rownames(posterior_summary(model_brms_Grazing))), ] #Checking 95% CI of elephant ID Intercept to assess if elephant ID correlates with alpha diversity"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d0ffddd",
   "metadata": {},
   "source": [
    "<h3><font color='green'>It looks like Elephant ID does have a discernible effect on alpha diversity.</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "763abced",
   "metadata": {},
   "source": [
    "<h4>Standard deviation for plate intercept<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 330,
   "id": "87c03ac4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# Plate_sd_trace <- mcmc_trace(posterior_array, pars = c(\"sd_Plate__Intercept\"))\n",
    "# Plate_sd_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"sd_Plate__Intercept\"))\n",
    "# Plate_sd_hist <- mcmc_hist(posterior_array, pars = \"sd_Plate__Intercept\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d26266d5",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 331,
   "id": "c1905e24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(Plate_sd_trace, Plate_sd_overlay, Plate_sd_hist, file = \"Plate_sd_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 332,
   "id": "3e309b9a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"Plate_sd_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 333,
   "id": "33448989",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(Plate_sd_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8371028",
   "metadata": {},
   "source": [
    "<h4>Acceptable<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 334,
   "id": "7d2e98ea",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(Plate_sd_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36f6a55b",
   "metadata": {},
   "source": [
    "<h4>Looks great<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 335,
   "id": "caefca72",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(Plate_sd_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "254ccd65",
   "metadata": {},
   "source": [
    "<h3><font color='green'>Plate affected alpha diversity measures.</font><h4>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44a6bc36",
   "metadata": {},
   "source": [
    "<h4>Sigma, to get an idea of prediction error<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 336,
   "id": "a10ab298",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# sigma_trace <- mcmc_trace(posterior_array, pars = c(\"sigma\"))\n",
    "# sigma_overlay <- mcmc_dens_overlay(posterior_array, pars = c(\"sigma\"))\n",
    "# sigma_hist <- mcmc_hist(posterior_array, pars = \"sigma\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9346e512",
   "metadata": {},
   "source": [
    "<h4>The below cell served to save the variables after initial compilation.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 337,
   "id": "30159adc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(sigma_trace, sigma_overlay, sigma_hist, file = \"sigma_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 338,
   "id": "d32d4d98",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"sigma_vars_alpha_model.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 339,
   "id": "200809e6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sigma_trace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89205c5a",
   "metadata": {},
   "source": [
    "<h4>Looks as expected<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 340,
   "id": "af487f23",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sigma_overlay)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14726abd",
   "metadata": {},
   "source": [
    "<h4>Looks good<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 341,
   "id": "a8782ded",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "print(sigma_hist)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4177840c",
   "metadata": {},
   "source": [
    "<h3><font color='green'>Prediction error estimated at about .6 to .8 species</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9f903fc",
   "metadata": {},
   "source": [
    "<h4>Check unexplained variation<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 342,
   "id": "35145830",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "bayes_R2(model_brms_Grazing)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c1ec63c3",
   "metadata": {},
   "source": [
    "<h3><font color='green'>Between 9.5 and 30.9% of variance is explained by our model.</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ed77a67",
   "metadata": {},
   "source": [
    "<h4>Check that temporal autocorrelation is not an issue in the model<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 343,
   "id": "ed6ab687",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Extract residuals from brms model (use residuals or fitted - observed)\n",
    "resid_data <- residuals(model_brms_Grazing, summary = FALSE)\n",
    "\n",
    "# Get posterior mean residuals (or use individual chains if preferred)\n",
    "resid_mean <- apply(resid_data, 2, mean)\n",
    "\n",
    "# Add residuals to original data\n",
    "data$residuals <- resid_mean\n",
    "\n",
    "# Plot ACF per elephant (those with enough repeated measures)\n",
    "\n",
    "# Select elephants with more than 5 samples\n",
    "valid_elephants <- data %>%\n",
    "  group_by(Elephant_ID) %>%\n",
    "  filter(n() > 5) %>%\n",
    "  pull(Elephant_ID) %>%\n",
    "  unique()\n",
    "\n",
    "# Create an empty data frame to store ACF results\n",
    "acf_df <- data.frame()\n",
    "\n",
    "# Store all ACF objects in a named list\n",
    "acf_list <- lapply(valid_elephants, function(id) {\n",
    "  series <- data %>% filter(Elephant_ID == id) %>% pull(residuals)\n",
    "  acf(series, plot = FALSE)\n",
    "})\n",
    "names(acf_list) <- valid_elephants"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ceab102b",
   "metadata": {},
   "source": [
    "<h4>Plot by elephant, for elephants with >5 samples<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 344,
   "id": "1d0f908f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Extract names\n",
    "names(acf_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 345,
   "id": "d99771de",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M25.00\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 346,
   "id": "be9387b4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M26.05\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 347,
   "id": "91069598",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M31.04\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 348,
   "id": "7ba1417b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M32.04\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 349,
   "id": "02f31e90",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M4.02\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 350,
   "id": "9ba7bf4e",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M45.01\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 351,
   "id": "bd5d74fc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M6.99\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 352,
   "id": "b44fa9ba",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M63.01\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 353,
   "id": "a370ebbc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M63.94\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 354,
   "id": "7cdb1ab4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M64.95\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7db0887",
   "metadata": {},
   "source": [
    "<h2><font color='black'>Figures added according to reviewer suggestions</font><h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 355,
   "id": "d0349979",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M7.8905\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 356,
   "id": "6e6f0b39",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"M9.02\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 357,
   "id": "d698d9dd",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R19.8801\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 358,
   "id": "5cf0ae93",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R21.04\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 359,
   "id": "5a2d80e3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R22.03\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 360,
   "id": "7b3156e9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R22.06\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 361,
   "id": "c12728f6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R22.8904\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 362,
   "id": "fa49050b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R25.03\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 363,
   "id": "8d1e8adb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R25.9002\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 364,
   "id": "6e4a1933",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R26.00\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 365,
   "id": "9a121292",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R27.04\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 366,
   "id": "3ecb5574",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R28.03\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 367,
   "id": "37f14bfe",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R28.99\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 368,
   "id": "b1eccf6b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R29.03\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 369,
   "id": "dea30817",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R36.06\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 370,
   "id": "8a80b9e1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"R38.01\"]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 371,
   "id": "efb0c6b6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "plot(acf_list[[\"S92.06\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f23dc3f",
   "metadata": {},
   "source": [
    "### <h3><font color='green'>It looks like one elephant is still showing some temporal autocorrelation with two lines other than the first line crossing the blue dotted lines (R27.04), and one other has one line (other than the acceptable first line) that barely crosses the blue line (R36.06). This is very acceptable, and even expected (Chatfield 2003, Zuur et al. 2009, Shumway and Stoffer 2017).</font><h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "befa9b0c",
   "metadata": {},
   "source": [
    "<h2>Figures that correspond to new alpha diversity results<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 372,
   "id": "b053ef94",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Extract posterior draws\n",
    "posterior_samples <- as.data.frame(model_brms_Grazing)\n",
    "\n",
    "# Identify fixed effect columns (excluding intercept)\n",
    "fixed_effects_cols <- grep(\"^b_(?!Intercept)\", names(posterior_samples), perl = TRUE)\n",
    "fixed_effects <- posterior_samples[, fixed_effects_cols]\n",
    "\n",
    "# Rename columns for custom y-axis labels\n",
    "colnames(fixed_effects) <- c(\n",
    "  \"Sex (ref = male)\",\n",
    "  \"Age\",\n",
    "  \"Livestock in reserve\",\n",
    "  \"Grazing (vs browsing)\",\n",
    "  \"Time since study start\",\n",
    "  \"Sex × Age interaction\"\n",
    ")\n",
    "\n",
    "# Plot with spacing adjustment\n",
    "mcmc_areas(\n",
    "  fixed_effects,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Effect Size\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  scale_x_continuous(limits = c(-1, 1)) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 16),\n",
    "    axis.text.x = element_text(size = 14),\n",
    "    axis.title.x = element_text(size = 16, margin = margin(t = 15)),\n",
    "    plot.title = element_blank()\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 373,
   "id": "8bca4c31",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Elephant ID\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "elephant_sd <- posterior_samples[, \"sd_Elephant_ID__Intercept\", drop = FALSE]\n",
    "\n",
    "# Compute 95% credible interval\n",
    "ci_bounds <- quantile(elephant_sd[[1]], probs = c(0.025, 0.975))\n",
    "\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  elephant_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  scale_x_continuous(breaks = seq(0, 0.35, by = 0.05), limits = c(0, 0.35)) +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Elephant ID\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5,vjust = -5, face=\"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 374,
   "id": "e7af2bb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Plate\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "plate_sd <- posterior_samples[, \"sd_Plate__Intercept\", drop = FALSE]\n",
    "\n",
    "# Plot\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  plate_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  scale_x_continuous(breaks = seq(0, 3, by = 0.5), limits = c(0, 3)) +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Laboratory analysis plate ('Plate')\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5, vjust = -5, face = \"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f2c4026",
   "metadata": {},
   "source": [
    "<h1><font color='red'>Post Review New Beta Analysis: The below is the new analysis looking at variable effects on community composition of the elephants' gut microbiomes.</font><h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da8f5a70",
   "metadata": {},
   "source": [
    "<h3>Get DEICODE biplot into a usable form for an R environment<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 375,
   "id": "d111dc48",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import needed programs\n",
    "import pandas as pd \n",
    "from skbio import OrdinationResults\n",
    "\n",
    "# # Load the biplot artifact\n",
    "# biplot = qiime2.Artifact.load('ordination.qza')\n",
    "\n",
    "# # Convert the artifact to a scikit-bio OrdinationResults object\n",
    "# ordination: OrdinationResults = biplot.view(OrdinationResults)\n",
    "    \n",
    "# # Extract sample coordinates as a DataFrame\n",
    "# df = ordination.samples.copy()\n",
    "# df.index.name = \"SampleID\"\n",
    "\n",
    "# # Save to ordination.tsv\n",
    "# df.to_csv(\"ordination.tsv\", sep=\"\\t\")\n",
    "\n",
    "# #Rename data frame columns\n",
    "# df.columns = ['Axis1', 'Axis2', 'Axis3']\n",
    "\n",
    "# #Preview the output\n",
    "# print(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c5d0af1",
   "metadata": {},
   "source": [
    "<h3>Move to R environment and execute a multivariate Bayesian analysis in the brms package.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 376,
   "id": "272e4b85",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "ordination <- read.delim(\"ordination.tsv\", sep = \"\\t\", header = TRUE) #Read in ordination data\n",
    "head(ordination) #Check to make sure everything looks good"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 377,
   "id": "853cf913",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#merge with metadata\n",
    "metadata <- read.csv(\"metadata_2.csv\", header = TRUE)\n",
    "colnames(metadata)[colnames(metadata) == \"X.SampleID\"] <- \"SampleID\"\n",
    "merged <- merge(ordination, metadata, by = \"SampleID\")\n",
    "\n",
    "#Check data distributions to select proper priors\n",
    "hist(merged$Axis1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 378,
   "id": "5f5641d1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "hist(merged$Axis2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 379,
   "id": "fcf79f47",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "hist(merged$Axis3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 380,
   "id": "2b652cca",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "median(merged$Axis1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 381,
   "id": "774b80c0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "median(merged$Axis2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 382,
   "id": "f0adcff3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "median(merged$Axis3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 383,
   "id": "ffcc764b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "max(merged$Axis1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 384,
   "id": "1f27ec95",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "max(merged$Axis2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 385,
   "id": "dab0ef9f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "max(merged$Axis3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 386,
   "id": "87097450",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "min(merged$Axis1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 387,
   "id": "a658a63e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "min(merged$Axis2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 388,
   "id": "9d1f1023",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "min(merged$Axis3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26848449",
   "metadata": {},
   "source": [
    "<h2>Now we are ready to choose priors, shown below. We will go through each axis separately and find a well-fitting model before combining the axes into a multivariate model. Multivariate fits tend not to be as clean, therefore running models for the three axes separately to find priors that fit well for each should improve the  model fit when they are all combined into one.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84334a3a",
   "metadata": {},
   "source": [
    "<h3>There is only one cell for each axis model to save time and space. The priors were changed within the same cell until an acceptable fit was found.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 389,
   "id": "1b8b80d0",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(brms)\n",
    "# library(ggplot2)\n",
    "# library(dplyr)\n",
    "# library(nlme)\n",
    "# library(bayesplot)\n",
    "\n",
    "# #First rename ordination axis columns to avoid confusion\n",
    "# colnames(merged)[colnames(merged) == \"X0\"] <- \"Axis1\"\n",
    "# colnames(merged)[colnames(merged) == \"X1\"] <- \"Axis2\"\n",
    "# colnames(merged)[colnames(merged) == \"X2\"] <- \"Axis3\"\n",
    "# head(merged)\n",
    "\n",
    "# merged$Plate <- as.factor(merged$Plate)\n",
    "# merged$Sex <- as.factor(merged$Sex)\n",
    "# merged$Elephant_ID <- as.factor(merged$Elephant_ID)\n",
    "# merged$Grazing <- as.factor(merged$Grazing)\n",
    "\n",
    "# merged <- as.data.frame(merged)\n",
    "\n",
    "# merged$Date.Sampled <- as.Date(merged$Date.Sampled,format=\"%m/%d/%y\")\n",
    "# merged$Time <- as.numeric(merged$Date.Sampled)\n",
    "# merged$Time_stdz <- scale(merged$Time)\n",
    "# merged <- merged[order(merged$Elephant_ID, merged$Time), ]\n",
    "\n",
    "# #Define priors\n",
    "# #boldly tightened priors based on posterior predictive checks showing it would improve model fit\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,0,0.01), class = \"Intercept\"), \n",
    "#   prior(student_t(3,0,0.01), class = \"b\"), \n",
    "#   prior(exponential(100), class = \"sd\"), \n",
    "#   prior(exponential(100), class = \"sds\"), \n",
    "#   prior(exponential(100), class = \"sigma\"), \n",
    "#   prior(gamma(2, 0.1), class = \"nu\") #Aiming for narrow-ish tails\n",
    "    \n",
    "# )\n",
    "\n",
    "\n",
    "# axis1 <- brm(\n",
    "#   bf(Axis1 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = merged,\n",
    "#   family = student(), #Originally chose gaussian, but it was not capturing the spread of the data well - it was too wide\n",
    "#   prior = priors,\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "# #The below code served to save the model after the initial run\n",
    "\n",
    "# saveRDS(axis1, \"axis_1_brms.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e2f38b5",
   "metadata": {},
   "source": [
    "<h3>Posterior predictive checks for the axis 1 only model<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 390,
   "id": "2edfbb49",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "axis1 <- readRDS(\"axis_1_brms.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 391,
   "id": "c6907386",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 392,
   "id": "7914cda6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis1, type=\"boxplot\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a8cb127",
   "metadata": {},
   "source": [
    "<h3>Fits well, next we check the results for the above Axis 1 model (involving only the first axis)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 393,
   "id": "4efc470e",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "round(posterior_summary(axis1, robust = TRUE), 5) #Need more significant digits, hence the extra pizzaz"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "feacc2dc",
   "metadata": {},
   "source": [
    "<h3>It looks like axis 1 is only correlated with elephant id and plate. Visuals below.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 394,
   "id": "839f0460",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Extract posterior draws\n",
    "posterior_samples <- as.data.frame(axis1)\n",
    "\n",
    "# Identify fixed effect columns (excluding intercept)\n",
    "fixed_effects_cols <- grep(\"^b_(?!Intercept)\", names(posterior_samples), perl = TRUE)\n",
    "fixed_effects <- posterior_samples[, fixed_effects_cols]\n",
    "\n",
    "# Rename columns for custom y-axis labels\n",
    "colnames(fixed_effects) <- c(\n",
    "  \"Sex (ref = male)\",\n",
    "  \"Age\",\n",
    "  \"Livestock in reserve\",\n",
    "  \"Grazing (vs browsing)\",\n",
    "  \"Time since study start\",\n",
    "  \"Sex × Age interaction\"\n",
    ")\n",
    "\n",
    "# Plot with spacing adjustment\n",
    "mcmc_areas(\n",
    "  fixed_effects,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Effect Size\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 16),\n",
    "    axis.text.x = element_text(size = 14),\n",
    "    axis.title.x = element_text(size = 16, margin = margin(t = 15)),\n",
    "    plot.title = element_blank()\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 395,
   "id": "04a986c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Elephant ID\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "elephant_sd <- posterior_samples[, \"sd_Elephant_ID__Intercept\", drop = FALSE]\n",
    "\n",
    "# Compute 95% credible interval\n",
    "ci_bounds <- quantile(elephant_sd[[1]], probs = c(0.025, 0.975))\n",
    "\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  elephant_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Elephant ID\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5,vjust = -5, face=\"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 396,
   "id": "b1c3ad50",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Plate\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "plate_sd <- posterior_samples[, \"sd_Plate__Intercept\", drop = FALSE]\n",
    "\n",
    "# Plot\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  plate_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Laboratory analysis plate ('Plate')\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5, vjust = -5, face = \"bold\")\n",
    "  )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbf63448",
   "metadata": {},
   "source": [
    "<h3>Check explained variation for the Axis 1 model<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 397,
   "id": "310f18f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "bayes_R2(axis1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62f80aad",
   "metadata": {},
   "source": [
    "<h3>Between 4.8% and 16.6% of variation in the Axis 1 ordination is explained by the above model of axis 1 alone.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9909ab02",
   "metadata": {},
   "source": [
    "<h3>Move on to Axis 2<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 398,
   "id": "89482b57",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# #Based on posterior predictive checks, loosened the variation parameters a bit for axis 2, and moved the mean of the intercept to the right\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,0.1,0.05), class = \"Intercept\"), \n",
    "#   prior(student_t(3,0,0.1), class = \"b\"), \n",
    "#   prior(exponential(100), class = \"sd\"), \n",
    "#   prior(exponential(100), class = \"sds\"), \n",
    "#   prior(exponential(100), class = \"sigma\"), \n",
    "#   prior(gamma(2, 0.1), class = \"nu\")\n",
    "# )\n",
    "\n",
    "# axis2 <- brm(\n",
    "#   bf(Axis2 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = merged,\n",
    "#   family = student(), \n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "# #The below code served to save the model after the initial run\n",
    "\n",
    "# saveRDS(axis2, \"axis_2_brms.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "587d0bf4",
   "metadata": {},
   "source": [
    "<h3>Posterior predictive checks for the axis 2 only model<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 399,
   "id": "f07f26cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "axis2 <- readRDS(\"axis_2_brms.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 400,
   "id": "8f716e6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 401,
   "id": "3dd66925",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis2, type=\"boxplot\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24635130",
   "metadata": {},
   "source": [
    "<h3>Fit not as good, but acceptable. There was a lot of adjustments of priors to get it to fit as well as it does now, even if it is underestimating the mean in some iterations.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 402,
   "id": "13866355",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "round(posterior_summary(axis2, robust = TRUE), 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff72a35c",
   "metadata": {},
   "source": [
    "<h3>Livestock, grazing, and especially time since start of study are showing effects on ordination. Plate, as always, and elephant ID are also showing correlations as before.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 403,
   "id": "6a8bf82a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Extract posterior draws\n",
    "posterior_samples <- as.data.frame(axis2)\n",
    "\n",
    "# Identify fixed effect columns (excluding intercept)\n",
    "fixed_effects_cols <- grep(\"^b_(?!Intercept)\", names(posterior_samples), perl = TRUE)\n",
    "fixed_effects <- posterior_samples[, fixed_effects_cols]\n",
    "\n",
    "# Rename columns for custom y-axis labels\n",
    "colnames(fixed_effects) <- c(\n",
    "  \"Sex (ref = male)\",\n",
    "  \"Age\",\n",
    "  \"Livestock in reserve\",\n",
    "  \"Grazing (vs browsing)\",\n",
    "  \"Time since study start\",\n",
    "  \"Sex × Age interaction\"\n",
    ")\n",
    "\n",
    "# Plot with spacing adjustment\n",
    "mcmc_areas(\n",
    "  fixed_effects,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Effect Size\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 16),\n",
    "    axis.text.x = element_text(size = 14),\n",
    "    axis.title.x = element_text(size = 16, margin = margin(t = 15)),\n",
    "    plot.title = element_blank()\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 404,
   "id": "9c0b02c0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Elephant ID\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "elephant_sd <- posterior_samples[, \"sd_Elephant_ID__Intercept\", drop = FALSE]\n",
    "\n",
    "# Compute 95% credible interval\n",
    "ci_bounds <- quantile(elephant_sd[[1]], probs = c(0.025, 0.975))\n",
    "\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  elephant_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Elephant ID\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5,vjust = -5, face=\"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 405,
   "id": "59b5b30f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Plate\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "plate_sd <- posterior_samples[, \"sd_Plate__Intercept\", drop = FALSE]\n",
    "\n",
    "# Plot\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  plate_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Laboratory analysis plate ('Plate')\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5, vjust = -5, face = \"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "654303b8",
   "metadata": {},
   "source": [
    "<h3>Check explained variation for the Axis 2 model<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 406,
   "id": "9269c5f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "bayes_R2(axis2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4023d644",
   "metadata": {},
   "source": [
    "<h3>The model for axis 2 alone explains between 4.4% and 40.2% of variation in the data.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "965d63b9",
   "metadata": {},
   "source": [
    "<h3>Move on to Axis 3<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 407,
   "id": "f64a8f85",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# #Axis 3 has a prior that moves the mean of the intercept back to 0, based on the data, and widen the standard deviation in the intercept a slight bit\n",
    "\n",
    "# priors <- c(\n",
    "#   prior(student_t(3,0,0.05), class = \"Intercept\"), \n",
    "#   prior(normal(0,.1), class = \"b\"), #Fixed effects were also changed to a normal distribution based on posterior predictive checks\n",
    "#   prior(exponential(100), class = \"sd\"), \n",
    "#   prior(exponential(100), class = \"sds\"), \n",
    "#   prior(exponential(100), class = \"sigma\"), \n",
    "#   prior(gamma(2, 0.1), class = \"nu\")\n",
    "# )\n",
    "\n",
    "\n",
    "# axis3 <- brm(\n",
    "#   bf(Axis3 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = \"fs\", xt = list(bs = \"tp\")) + # Spline helps to account for minimal time autocorrelation structure within each elephant - more explicit methods could not be used because spacing between samples was irregular, and some elephants only had one sample (Adams 2023, Zuur et al. 2009 for not needing a perfect structure, MacNab and Dean 2001 for using splines for temporal modeling)...more common structure like gp() were problematic, maybe because too many elephants had only one sample. Source about the thin plate regression spines option = Wood 2003, 2017\n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate)\n",
    "#      ),\n",
    "#   data = merged,\n",
    "#   family = student(), \n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "# #The below code served to save the model after the initial run\n",
    "\n",
    "# saveRDS(axis3, \"axis_3_brms.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "784a1687",
   "metadata": {},
   "source": [
    "<h3>Posterior predictive checks for the axis 3 only model<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 408,
   "id": "ccf632be",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "axis3 <- readRDS(\"axis_3_brms.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 409,
   "id": "7ba822c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 410,
   "id": "d1d23624",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(axis3, type=\"boxplot\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "558e7900",
   "metadata": {},
   "source": [
    "<h3>Looks pretty good given the bimodal nature of the third axis<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 411,
   "id": "31a35998",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "round(posterior_summary(axis3, robust = TRUE), 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe56b6ef",
   "metadata": {},
   "source": [
    "<h3>Time, elephant ID, and plate are showing correlations.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 412,
   "id": "decc79ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Extract posterior draws\n",
    "posterior_samples <- as.data.frame(axis3)\n",
    "\n",
    "# Identify fixed effect columns (excluding intercept)\n",
    "fixed_effects_cols <- grep(\"^b_(?!Intercept)\", names(posterior_samples), perl = TRUE)\n",
    "fixed_effects <- posterior_samples[, fixed_effects_cols]\n",
    "\n",
    "# Rename columns for custom y-axis labels\n",
    "colnames(fixed_effects) <- c(\n",
    "  \"Sex (ref = male)\",\n",
    "  \"Age\",\n",
    "  \"Livestock in reserve\",\n",
    "  \"Grazing (vs browsing)\",\n",
    "  \"Time since study start\",\n",
    "  \"Sex × Age interaction\"\n",
    ")\n",
    "\n",
    "# Plot with spacing adjustment\n",
    "mcmc_areas(\n",
    "  fixed_effects,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Effect Size\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 16),\n",
    "    axis.text.x = element_text(size = 14),\n",
    "    axis.title.x = element_text(size = 16, margin = margin(t = 15)),\n",
    "    plot.title = element_blank()\n",
    "  )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 413,
   "id": "a29c1074",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Elephant ID\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "elephant_sd <- posterior_samples[, \"sd_Elephant_ID__Intercept\", drop = FALSE]\n",
    "\n",
    "# Compute 95% credible interval\n",
    "ci_bounds <- quantile(elephant_sd[[1]], probs = c(0.025, 0.975))\n",
    "\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  elephant_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Elephant ID\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5,vjust = -5, face=\"bold\")\n",
    "  )\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 414,
   "id": "42cf7118",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Plate\n",
    "\n",
    "# Extract the column as a 1-column data frame\n",
    "plate_sd <- posterior_samples[, \"sd_Plate__Intercept\", drop = FALSE]\n",
    "\n",
    "# Plot\n",
    "# Plot\n",
    "mcmc_areas(\n",
    "  plate_sd,\n",
    "  prob = 0.95\n",
    ") +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"red\") +\n",
    "  labs(\n",
    "    x = \"Standard Deviation\",\n",
    "    y = NULL\n",
    "  ) +\n",
    "  ggtitle(\"Laboratory analysis plate ('Plate')\") +\n",
    "  theme(\n",
    "    axis.text.y = element_blank(),\n",
    "    axis.ticks.y = element_blank(),\n",
    "    axis.title.y = element_blank(),\n",
    "    axis.text.x = element_text(size = 18),\n",
    "    axis.title.x = element_text(size = 18, margin = margin(t = 15)),\n",
    "    plot.title = element_text(size = 20, hjust = 0.5, vjust = -5, face = \"bold\")\n",
    "  )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee0b4034",
   "metadata": {},
   "source": [
    "<h3>Check explained variation<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 415,
   "id": "30347e11",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "bayes_R2(axis3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4a5089d",
   "metadata": {},
   "source": [
    "<h3>The Axis 3 model explains between 7.5% and 24.9% of variation in the data<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d48399a",
   "metadata": {},
   "source": [
    "<h1>Now, we combine all of the axes models into a multivariate analysis of ordination below.<h1>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 416,
   "id": "fb510426",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# library(brms)\n",
    "# library(ggplot2)\n",
    "# library(dplyr)\n",
    "# library(nlme)\n",
    "# library(bayesplot)\n",
    "\n",
    "# # Define Axis 1 model\n",
    "# axis1_multi <- bf(Axis1 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# # Define Axis 2 model\n",
    "# axis2_multi <- bf(Axis2 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# # Define Axis 3 model\n",
    "# axis3_multi <- bf(Axis3 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# priors_all <- c(\n",
    "#   # Axis1\n",
    "#   prior(student_t(3, 0, 0.01), class = \"Intercept\", resp = \"Axis1\"),\n",
    "#   prior(student_t(3, 0, 0.01), class = \"b\", resp = \"Axis1\"),\n",
    "#   prior(exponential(100), class = \"sd\", resp = \"Axis1\"),\n",
    "#   prior(exponential(100), class = \"sds\", resp = \"Axis1\"),\n",
    "#   prior(exponential(100), class = \"sigma\", resp = \"Axis1\"),\n",
    "\n",
    "#   # Axis2\n",
    "#   prior(student_t(3, 0.1, 0.05), class = \"Intercept\", resp = \"Axis2\"),\n",
    "#   prior(student_t(3, 0, 0.1), class = \"b\", resp = \"Axis2\"),\n",
    "#   prior(exponential(100), class = \"sd\", resp = \"Axis2\"),\n",
    "#   prior(exponential(100), class = \"sds\", resp = \"Axis2\"),\n",
    "#   prior(exponential(100), class = \"sigma\", resp = \"Axis2\"),\n",
    "\n",
    "#   # Axis3\n",
    "#   prior(student_t(3, 0, 0.05), class = \"Intercept\", resp = \"Axis3\"),\n",
    "#   prior(normal(0, 0.1), class = \"b\", resp = \"Axis3\"),\n",
    "#   prior(exponential(100), class = \"sd\", resp = \"Axis3\"),\n",
    "#   prior(exponential(100), class = \"sds\", resp = \"Axis3\"),\n",
    "#   prior(exponential(100), class = \"sigma\", resp = \"Axis3\"),\n",
    "\n",
    "#   # Shared nu prior — do *not* assign `resp` here\n",
    "#   prior(gamma(2, 0.1), class = \"nu\")\n",
    "# )\n",
    "\n",
    "# # Fit the multivariate model\n",
    "# all_axes <- brm(\n",
    "#   axis1_multi + axis2_multi + axis3_multi + set_rescor(TRUE),\n",
    "#   data = merged,\n",
    "#   prior = priors_all,\n",
    "#   family = student(),\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "#The below line saved the model after the intial run.\n",
    "\n",
    "# saveRDS(all_axes, \"all_axes_model.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 417,
   "id": "b68a527a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "all_axes <- readRDS(\"all_axes_model.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa37f759",
   "metadata": {},
   "source": [
    "<h3>Check fit with respect to each axis in turn<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 418,
   "id": "c098dd35",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis1\") + xlim(-.4,.4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 419,
   "id": "a1dcd121",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis1\", type=\"boxplot\") + ylim(-.5,.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 420,
   "id": "f693c3e2",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis2\") + xlim(-.4,.4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 421,
   "id": "26731b1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis2\", type=\"boxplot\") + ylim(-1,1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 422,
   "id": "28e1b31f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis3\") + xlim(-.4,.4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 423,
   "id": "6338fe9c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes, resp = \"Axis3\", type=\"boxplot\") + ylim(-.2,.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32015c81",
   "metadata": {},
   "source": [
    "<h2>Attempt to tighten the variation, which is quite wild, even though the central tendency is approximated very well.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05472ec0",
   "metadata": {},
   "source": [
    "<h3>Two models were saved using the below cell; one that tightened variation and then another (that which is shown) that tightened variation even more. Again, we used the same cell while trying different priors to save time and space.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 424,
   "id": "13d08c1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# library(brms)\n",
    "# library(ggplot2)\n",
    "# library(dplyr)\n",
    "# library(nlme)\n",
    "# library(bayesplot)\n",
    "\n",
    "# # Define Axis 1 model\n",
    "# axis1_multi <- bf(Axis1 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# # Define Axis 2 model\n",
    "# axis2_multi <- bf(Axis2 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# # Define Axis 3 model\n",
    "# axis3_multi <- bf(Axis3 ~\n",
    "#        Sex * Age.at.Sampling.stdz +\n",
    "#        Livestock_avg_of_avgs_stdz +\n",
    "#        Grazing +\n",
    "#        Time_stdz + \n",
    "#        s(Time, Elephant_ID, bs = 'fs', xt = list(bs = 'tp')) + \n",
    "#        (1 | Elephant_ID) +\n",
    "#        (1 | Plate))\n",
    "\n",
    "# priors_all <- c(\n",
    "#   # Axis1\n",
    "#   prior(student_t(3, 0, 0.01), class = \"Intercept\", resp = \"Axis1\"),\n",
    "#   prior(student_t(3, 0, 0.01), class = \"b\", resp = \"Axis1\"),\n",
    "#   prior(normal(0,0.05), class = \"sd\", lb = 0, resp = \"Axis1\"),\n",
    "#   prior(normal(0,0.05), class = \"sds\", lb = 0, resp = \"Axis1\"),\n",
    "#   prior(normal(0,0.05), class = \"sigma\",lb = 0, resp = \"Axis1\"),\n",
    "\n",
    "#   # Axis2\n",
    "#   prior(student_t(3, 0.1, 0.05), class = \"Intercept\", resp = \"Axis2\"),\n",
    "#   prior(student_t(3, 0, 0.1), class = \"b\", resp = \"Axis2\"),\n",
    "#   prior(normal(0,0.05), class = \"sd\", lb = 0, resp = \"Axis2\"),\n",
    "#   prior(normal(0,0.05), class = \"sds\", lb = 0, resp = \"Axis2\"),\n",
    "#   prior(normal(0,0.05), class = \"sigma\",lb = 0, resp = \"Axis2\"),\n",
    "\n",
    "#   # Axis3\n",
    "#   prior(student_t(3, 0, 0.05), class = \"Intercept\", resp = \"Axis3\"),\n",
    "#   prior(normal(0, 0.1), class = \"b\", resp = \"Axis3\"),\n",
    "#   prior(normal(0,0.05), class = \"sd\", lb = 0, resp = \"Axis3\"),\n",
    "#   prior(normal(0,0.05), class = \"sds\", lb = 0, resp = \"Axis3\"),\n",
    "#   prior(normal(0,0.05), class = \"sigma\",lb = 0, resp = \"Axis3\"),\n",
    "\n",
    "#   # Shared nu prior — do *not* assign `resp` here\n",
    "#   prior(gamma(20, 10), class = \"nu\")\n",
    "# )\n",
    "\n",
    "# # Fit the multivariate model\n",
    "# all_axes_tightened_var_even_more <- brm(\n",
    "#   axis1_multi + axis2_multi + axis3_multi + set_rescor(TRUE),\n",
    "#   data = merged,\n",
    "#   prior = priors_all,\n",
    "#   family = student(),\n",
    "#   chains = 4,\n",
    "#   cores = 4,\n",
    "#   iter = 8000,\n",
    "#   warmup = 4000,\n",
    "#   control = list(adapt_delta = 0.9999, max_treedepth = 20)\n",
    "# )\n",
    "\n",
    "# saveRDS(all_axes_tightened_var_even_more, \"all_axes_model_tightened_var_even_more.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bee3e31a",
   "metadata": {},
   "source": [
    "<h2>It seems that the best fit is the tigthtened variation, but not the \"even more\" model, as the mean for Axis 2 starts to be off in the even more model.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 425,
   "id": "fecff7a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "all_axes <- readRDS(\"all_axes_model.rds\")\n",
    "all_axes_tightened_var <- readRDS(\"all_axes_model_tightened_var.rds\")\n",
    "all_axes_tightened_var_even_more <- readRDS(\"all_axes_model_tightened_var_even_more.rds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 426,
   "id": "1a13f212",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes_tightened_var , resp = \"Axis3\") + xlim(-.4,.4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 427,
   "id": "1d7a8302",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pp_check(all_axes_tightened_var_even_more, resp = \"Axis1\", type=\"boxplot\") + ylim(-.1,.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6de2bf8",
   "metadata": {},
   "source": [
    "<h3> I will look at posterior predictve p-values to make the final decision.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 428,
   "id": "60d6c12f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# # Load necessary libraries\n",
    "# library(brms)     \n",
    "# library(posterior) # for posterior draws\n",
    "# library(tidyverse)\n",
    "\n",
    "# # Simulate replicated datasets from the posterior\n",
    "# yrep_aa <- posterior_predict(all_axes)  # Matrix: [n_draws x n_obs]\n",
    "# yrep_aa_tv <- posterior_predict(all_axes_tightened_var)\n",
    "# yrep_aa_tv_em <- posterior_predict(all_axes_tightened_var_even_more)\n",
    "\n",
    "y_obs_a1 <- merged$Axis1 \n",
    "y_obs_a2 <- merged$Axis2  \n",
    "y_obs_a3 <- merged$Axis3  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "706624af",
   "metadata": {},
   "source": [
    "<h2>The below cell saved the posterior predictive check variables after the initial run of the above cell.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 429,
   "id": "566e5d35",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# save(yrep_aa, yrep_aa_tv, yrep_aa_tv_em , file = \"multivariate_pp_variables.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 430,
   "id": "63c12088",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "load(\"multivariate_pp_variables.RData\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 431,
   "id": "ca5b8546",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Values for all axes together\n",
    "y_obs_all <- c(y_obs_a1, y_obs_a2, y_obs_a3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 432,
   "id": "120179a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "# Compute test statistics\n",
    "test_stat_mean <- function(y) mean(y)\n",
    "test_stat_sd   <- function(y) sd(y)\n",
    "\n",
    "# Compute observed test statistics\n",
    "mean_obs <- test_stat_mean(y_obs_all)\n",
    "sd_obs   <- test_stat_sd(y_obs_all)\n",
    "\n",
    "# Compute test statistics for replicated datasets\n",
    "mean_rep_aa <- apply(yrep_aa, 1, test_stat_mean)\n",
    "mean_rep_aa_tv <- apply(yrep_aa_tv, 1, test_stat_mean)\n",
    "mean_rep_aa_tv_em <- apply(yrep_aa_tv_em, 1, test_stat_mean)\n",
    "sd_rep_aa   <- apply(yrep_aa, 1, test_stat_sd)\n",
    "sd_rep_aa_tv   <- apply(yrep_aa_tv, 1, test_stat_sd)\n",
    "sd_rep_aa_tv_em   <- apply(yrep_aa_tv_em, 1, test_stat_sd)\n",
    "\n",
    "# Calculate partial posterior predictive p-values\n",
    "pval_mean_aa <- mean(mean_rep_aa >= mean_obs)\n",
    "pval_mean_aa_tv <- mean(mean_rep_aa_tv >= mean_obs)\n",
    "pval_mean_aa_tv_em <- mean(mean_rep_aa_tv_em >= mean_obs)\n",
    "\n",
    "pval_sd_aa  <- mean(sd_rep_aa >= sd_obs)\n",
    "pval_sd_aa_tv   <- mean(sd_rep_aa_tv >= sd_obs)\n",
    "pval_sd_aa_tv_em   <- mean(sd_rep_aa_tv_em >= sd_obs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 433,
   "id": "0433fb03",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#View results\n",
    "\n",
    "#all_axes model\n",
    "\n",
    "print(pval_mean_aa)\n",
    "print(pval_sd_aa)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 434,
   "id": "9c7a3041",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#View results\n",
    "\n",
    "#all_axes_tightened_var model\n",
    "\n",
    "print(pval_mean_aa_tv)\n",
    "print(pval_sd_aa_tv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 435,
   "id": "b45909cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#View results\n",
    "\n",
    "#all_axes_tightened_var_even_more model\n",
    "\n",
    "print(pval_mean_aa_tv_em)\n",
    "print(pval_sd_aa_tv_em)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0536305e",
   "metadata": {},
   "source": [
    "<h2>The second multivariate model with tightened variation has the best p-values, so that is what we will go with.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 436,
   "id": "065dfa95",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(brms)\n",
    "library(bayesplot)\n",
    "\n",
    "all_axes_model_tightened_var <- readRDS(\"all_axes_model_tightened_var.rds\")\n",
    "\n",
    "bayes_R2(all_axes_model_tightened_var)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab4d7dc5",
   "metadata": {},
   "source": [
    "<h3>View results for fixed effects (loading all effects was taking too long)<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 437,
   "id": "ee0a3cc2",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "posterior_samples <- as.data.frame(all_axes_tightened_var)\n",
    "\n",
    "print(colnames(posterior_samples)[1:50])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 438,
   "id": "9024fe70",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(posterior)\n",
    "library(dplyr)\n",
    "\n",
    "\n",
    "#Extract fixed draws\n",
    "\n",
    "fixed_draws <- posterior_samples[, grep(\"^b_\", colnames(posterior_samples))]\n",
    "\n",
    "# Summarize the fixed effects draws\n",
    "fixed_summary <- summarise_draws(fixed_draws, \n",
    "                                 mean,\n",
    "                                 sd,\n",
    "                                 ~quantile2(.x, probs = c(0.025, 0.975)))\n",
    "\n",
    "# Clean and relabel parameter names\n",
    "fixed_summary <- fixed_summary %>%\n",
    "  rename(\n",
    "    Estimate = mean,\n",
    "      SE = sd,\n",
    "    `CI_lower` = q2.5,\n",
    "    `CI_upper` = q97.5\n",
    "  ) %>%\n",
    "  mutate(Parameter = gsub(\"^b_\", \"\", variable)) %>%\n",
    "  select(Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# View the summary\n",
    "print(fixed_summary, n=\"all\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a82da44b",
   "metadata": {},
   "source": [
    "<h3>View results for random effects<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 439,
   "id": "9e8cb361",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(posterior)\n",
    "library(dplyr)\n",
    "\n",
    "# Extract standard deviation (group-level) parameters only\n",
    "sd_draws <- posterior_samples[, grep(\"^sd_\", colnames(posterior_samples))]\n",
    "\n",
    "# Summarize them\n",
    "sd_summary <- summarise_draws(sd_draws,\n",
    "                              mean,\n",
    "                              sd,\n",
    "                              ~quantile2(.x, probs = c(0.025, 0.975)))\n",
    "\n",
    "# Clean and relabel\n",
    "sd_summary <- sd_summary %>%\n",
    "  rename(\n",
    "    Estimate = mean,\n",
    "    SE = sd,\n",
    "    CI_lower = q2.5,\n",
    "    CI_upper = q97.5\n",
    "  ) %>%\n",
    "  mutate(Parameter = gsub(\"^sd_\", \"\", variable)) %>%\n",
    "  select(Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# Print the clean summary of group-level SDs\n",
    "print(sd_summary, n = Inf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7ac2361",
   "metadata": {},
   "source": [
    "<h3>Get the first few spline results.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 440,
   "id": "444c53b2",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(posterior)\n",
    "library(dplyr)\n",
    "\n",
    "# Axis 1\n",
    "spline_A1 <- posterior_samples[, grep(\"^sds_Axis1_sTimeElephant_ID_[1-3]$\", colnames(posterior_samples))]\n",
    "summary_A1 <- summarise_draws(spline_A1,\n",
    "                              mean,\n",
    "                              sd,\n",
    "                              ~quantile2(.x, probs = c(0.025, 0.975))) %>%\n",
    "  rename(Estimate = mean,\n",
    "         SE = sd,\n",
    "         CI_lower = q2.5,\n",
    "         CI_upper = q97.5) %>%\n",
    "  mutate(Axis = \"Axis1\",\n",
    "         Parameter = variable) %>%\n",
    "  select(Axis, Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# Axis 2\n",
    "spline_A2 <- posterior_samples[, grep(\"^sds_Axis2_sTimeElephant_ID_[1-3]$\", colnames(posterior_samples))]\n",
    "summary_A2 <- summarise_draws(spline_A2,\n",
    "                              mean,\n",
    "                              sd,\n",
    "                              ~quantile2(.x, probs = c(0.025, 0.975))) %>%\n",
    "  rename(Estimate = mean,\n",
    "         SE = sd,\n",
    "         CI_lower = q2.5,\n",
    "         CI_upper = q97.5) %>%\n",
    "  mutate(Axis = \"Axis2\",\n",
    "         Parameter = variable) %>%\n",
    "  select(Axis, Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# Axis 3\n",
    "spline_A3 <- posterior_samples[, grep(\"^sds_Axis3_sTimeElephant_ID_[1-3]$\", colnames(posterior_samples))]\n",
    "summary_A3 <- summarise_draws(spline_A3,\n",
    "                              mean,\n",
    "                              sd,\n",
    "                              ~quantile2(.x, probs = c(0.025, 0.975))) %>%\n",
    "  rename(Estimate = mean,\n",
    "         SE = sd,\n",
    "         CI_lower = q2.5,\n",
    "         CI_upper = q97.5) %>%\n",
    "  mutate(Axis = \"Axis3\",\n",
    "         Parameter = variable) %>%\n",
    "  select(Axis, Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# Combine all\n",
    "spline_summary_all <- bind_rows(summary_A1, summary_A2, summary_A3)\n",
    "\n",
    "# Print\n",
    "print(spline_summary_all, n = Inf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c51ec7b",
   "metadata": {},
   "source": [
    "<h2>Get sigma results<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 441,
   "id": "f2e1d0e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(posterior)\n",
    "library(dplyr)\n",
    "\n",
    "sigma_draws <- posterior_samples[, grep(\"^sigma_\", colnames(posterior_samples))]\n",
    "\n",
    "# Summarize the fixed effects draws\n",
    "sigma_summary <- summarise_draws(sigma_draws, \n",
    "                                 mean,\n",
    "                                 sd,\n",
    "                                 ~quantile2(.x, probs = c(0.025, 0.975)))\n",
    "\n",
    "# Clean and relabel parameter names (optional but nice)\n",
    "sigma_summary <- sigma_summary %>%\n",
    "  rename(\n",
    "    Estimate = mean,\n",
    "      SE = sd,\n",
    "    `CI_lower` = q2.5,\n",
    "    `CI_upper` = q97.5\n",
    "  ) %>%\n",
    "  mutate(Parameter = gsub(\"^sigma_\", \"\", variable)) %>%\n",
    "  select(Parameter, Estimate, SE, CI_lower, CI_upper)\n",
    "\n",
    "# View the summary\n",
    "print(sigma_summary, n=\"all\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9b00940",
   "metadata": {},
   "source": [
    "<h4>Install packages for plotting.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 442,
   "id": "2513fae4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# install.packages(\"quadprog\", type = \"source\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 443,
   "id": "22c89c57",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# install.packages(\"ggdist\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 444,
   "id": "a4da45b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# install.packages(\"RColorBrewer\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 445,
   "id": "abdc782b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(bayesplot)\n",
    "library(dplyr)\n",
    "library(tidyr)\n",
    "library(ggplot2)\n",
    "library(ggdist)\n",
    "\n",
    "# Extract posterior samples\n",
    "posterior_samples <- as.data.frame(all_axes_tightened_var)\n",
    "\n",
    "# Function to extract fixed effects (excluding intercept) and label them\n",
    "extract_axis_effects <- function(axis_name, label_suffix) {\n",
    "  cols <- grep(paste0(\"^b_\", axis_name, \"_\"), names(posterior_samples), value = TRUE)\n",
    "  cols <- cols[!grepl(\"Intercept\", cols)]\n",
    "  df <- posterior_samples[, cols]\n",
    "  colnames(df) <- c(\n",
    "    \"Sex (ref = male)\",\n",
    "    \"Age\",\n",
    "    \"Livestock in reserve\",\n",
    "    \"Grazing (vs browsing)\",\n",
    "    \"Time since study start\",\n",
    "    \"Sex × Age interaction\"\n",
    "  )\n",
    "  df_long <- df %>%\n",
    "    pivot_longer(cols = everything(), names_to = \"Parameter\", values_to = \"Value\") %>%\n",
    "    mutate(Axis = label_suffix)\n",
    "  return(df_long)\n",
    "}\n",
    "\n",
    "# Extract for each axis\n",
    "axis1_df <- extract_axis_effects(\"Axis1\", \"Axis 1\")\n",
    "axis2_df <- extract_axis_effects(\"Axis2\", \"Axis 2\")\n",
    "axis3_df <- extract_axis_effects(\"Axis3\", \"Axis 3\")\n",
    "\n",
    "# Combine all axes\n",
    "combined_df <- bind_rows(axis1_df, axis2_df, axis3_df)\n",
    "\n",
    "# Set axis order: Axis 1 at top, Axis 3 at bottom (in dodge group)\n",
    "combined_df$Axis <- factor(combined_df$Axis, levels = c(\"Axis 3\", \"Axis 2\", \"Axis 1\"))\n",
    "\n",
    "#Colors\n",
    "\n",
    "axis_colors <- RColorBrewer::brewer.pal(3, \"Dark2\")\n",
    "names(axis_colors) <- c(\"Axis 3\", \"Axis 2\", \"Axis 1\")  # match factor level order in dodge\n",
    "\n",
    "\n",
    "# Reverse parameter order so \"Sex\" is at the top\n",
    "combined_df$Parameter <- factor(combined_df$Parameter, levels = rev(unique(combined_df$Parameter)))\n",
    "\n",
    "# Plot: Overlay with color by axis\n",
    "ggplot(combined_df, aes(x = Value, y = Parameter, fill = Axis, color = Axis)) +\n",
    "  stat_halfeye(\n",
    "    .width = 0.95,\n",
    "    position = position_dodge(width = 0.7),\n",
    "    slab_alpha = 0.6,\n",
    "    point_interval = \"median_qi\"\n",
    "  ) +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"gray40\") +\n",
    "scale_fill_manual(\n",
    "  values = axis_colors,\n",
    "  breaks = c(\"Axis 1\", \"Axis 2\", \"Axis 3\")\n",
    ") +\n",
    "scale_color_manual(\n",
    "  values = axis_colors,\n",
    "  breaks = c(\"Axis 1\", \"Axis 2\", \"Axis 3\")\n",
    ") +\n",
    "  labs(\n",
    "    x = \"Effect Size\",\n",
    "    y = NULL,\n",
    "    fill = \"Axis\",\n",
    "    color = \"Axis\"\n",
    "  ) +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 14,face=\"bold\"),\n",
    "    axis.text.x = element_text(size = 12),\n",
    "    axis.title.x = element_text(size = 14, margin = margin(t = 10)),\n",
    "    legend.position = \"bottom\"\n",
    "  ) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 446,
   "id": "a40410cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(bayesplot)\n",
    "library(dplyr)\n",
    "library(tidyr)\n",
    "library(ggplot2)\n",
    "library(ggdist)\n",
    "\n",
    "# Extract posterior draws as dataframe\n",
    "posterior_samples <- as.data.frame(all_axes_tightened_var)\n",
    "\n",
    "# Find columns for random effect SDs\n",
    "sd_cols <- grep(\"^sd_\", names(posterior_samples), value = TRUE)\n",
    "\n",
    "# Extract those columns\n",
    "sd_df <- posterior_samples[, sd_cols]\n",
    "\n",
    "# Rename columns (check order matches posterior_samples!)\n",
    "colnames(sd_df) <- c(\n",
    "  \"Elephant_ID - Axis 1\",\n",
    "  \"Elephant_ID - Axis 2\",\n",
    "  \"Elephant_ID - Axis 3\",\n",
    "  \"Plate - Axis 1\",\n",
    "  \"Plate - Axis 2\",\n",
    "  \"Plate - Axis 3\"\n",
    ")\n",
    "\n",
    "# Pivot to long format\n",
    "sd_long <- sd_df %>%\n",
    "  pivot_longer(cols = everything(), names_to = \"Effect\", values_to = \"SD\")\n",
    "\n",
    "# Extract Axis info\n",
    "sd_long <- sd_long %>%\n",
    "  mutate(Axis = sub(\".* - (Axis [123])$\", \"\\\\1\", Effect))\n",
    "\n",
    "# Set color/legend Axis order (1 to 3 for light-to-dark)\n",
    "sd_long$Axis <- factor(sd_long$Axis, levels = c(\"Axis 1\", \"Axis 2\", \"Axis 3\"))\n",
    "\n",
    "# Set Effect factor levels in reverse so Axis 1 is at the top\n",
    "sd_long$Effect <- factor(sd_long$Effect, levels = rev(c(\n",
    "  \"Elephant_ID - Axis 1\",\n",
    "  \"Elephant_ID - Axis 2\",\n",
    "  \"Elephant_ID - Axis 3\",\n",
    "  \"Plate - Axis 1\",\n",
    "  \"Plate - Axis 2\",\n",
    "  \"Plate - Axis 3\"\n",
    ")))\n",
    "\n",
    "# Custom color map\n",
    "axis_colors <- c(\n",
    "  \"Axis 1\" = \"#7570B3\",  # purple\n",
    "  \"Axis 2\" = \"#D95F02\",  # orange\n",
    "  \"Axis 3\" = \"#1B9E77\"   # green\n",
    ")\n",
    "\n",
    "# Plot:\n",
    "ggplot(sd_long, aes(x = SD, y = Effect, fill = Axis, color = Axis)) +\n",
    "  stat_halfeye(\n",
    "    .width = 0.95,\n",
    "    point_interval = \"median_qi\",\n",
    "    alpha = 0.7,\n",
    "    position = position_dodge(width = 0.7)\n",
    "  ) +\n",
    "  geom_vline(xintercept = 0, linetype = \"dashed\", color = \"gray40\") +\n",
    "  scale_fill_manual(values = axis_colors) +\n",
    "  scale_color_manual(values = axis_colors) +\n",
    "  labs(\n",
    "    x = \"Posterior Standard Deviation\",\n",
    "    y = NULL,\n",
    "    fill = \"Axis\",\n",
    "    color = \"Axis\"\n",
    "  ) +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  theme(\n",
    "    axis.text.y = element_text(size = 14, face = \"bold\"),\n",
    "    axis.text.x = element_text(size = 12),\n",
    "    axis.title.x = element_text(size = 14, margin = margin(t = 10)),\n",
    "    legend.position = \"bottom\"\n",
    "  )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81807835",
   "metadata": {},
   "source": [
    "<h2>Check which taxa are driving differences.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 447,
   "id": "e60143ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from skbio import OrdinationResults"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 448,
   "id": "e611d4b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "biplot = qiime2.Artifact.load(\"ordination.qza\")\n",
    "ordination: OrdinationResults = biplot.view(OrdinationResults)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acb15cbe",
   "metadata": {},
   "source": [
    "<h4>Extract sample scores<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 449,
   "id": "da62827d",
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = ordination.samples.copy()\n",
    "samples.index.name = \"SampleID\"\n",
    "samples.columns = [f\"Axis{i+1}\" for i in range(samples.shape[1])]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "795c94bf",
   "metadata": {},
   "source": [
    "<h4>Extract feature loadings<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 450,
   "id": "6e877326",
   "metadata": {},
   "outputs": [],
   "source": [
    "features = ordination.features.copy()\n",
    "features.index.name = \"FeatureID\"\n",
    "features.columns = [f\"Axis{i+1}\" for i in range(features.shape[1])]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2bd84a2",
   "metadata": {},
   "source": [
    "<h4>Save to tsv and csv<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 451,
   "id": "6a276955",
   "metadata": {},
   "outputs": [],
   "source": [
    "samples.to_csv(\"ordination.tsv\", sep=\"\\t\")\n",
    "samples.to_csv(\"ordination.csv\")\n",
    "features.to_csv(\"feature_loadings.tsv\", sep=\"\\t\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee00bb25",
   "metadata": {},
   "source": [
    "<h4>Load taxonomy<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 452,
   "id": "ec6a5807",
   "metadata": {},
   "outputs": [],
   "source": [
    "taxonomy = qiime2.Artifact.load(\"taxonomy.qza\").view(pd.DataFrame)\n",
    "taxonomy.index.name = \"FeatureID\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48ba09a2",
   "metadata": {},
   "source": [
    "<h4>Merge with loadings<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 453,
   "id": "2f6fe08d",
   "metadata": {},
   "outputs": [],
   "source": [
    "features_tax = features.merge(taxonomy, left_index=True, right_index=True)\n",
    "features_tax.to_csv(\"feature_loadings_with_taxonomy.tsv\", sep=\"\\t\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8a3f454",
   "metadata": {},
   "source": [
    "<h4>Start with Axes 2 and 3 because that is where we are seeing the most effects with respect to livestock.<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 454,
   "id": "c265e17c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "axes = [\"Axis1\", \"Axis2\", \"Axis3\"]\n",
    "top_taxa = features_tax[axes + [\"Taxon\"]].copy()\n",
    "\n",
    "#Create absolute loadings for both axes\n",
    "top_taxa[\"abs_axis2\"] = top_taxa[\"Axis2\"].abs()  # NEW\n",
    "top_taxa[\"abs_axis3\"] = top_taxa[\"Axis3\"].abs()  # NEW\n",
    "\n",
    "#Get top 10 by Axis2\n",
    "top10_axis2 = top_taxa.sort_values(\"abs_axis2\", ascending=False).head(10)\n",
    "\n",
    "#Get top 10 by Axis3\n",
    "top10_axis3 = top_taxa.sort_values(\"abs_axis3\", ascending=False).head(10)\n",
    "\n",
    "#Combine and remove duplicates\n",
    "top10_combined = pd.concat([top10_axis2, top10_axis3]).drop_duplicates(keep='first')\n",
    "\n",
    "print(top10_combined)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd0e0830",
   "metadata": {},
   "source": [
    "<h4>Export Axes 2 and 3 top 10 for use in R<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 455,
   "id": "93b26b70",
   "metadata": {},
   "outputs": [],
   "source": [
    "# top10_combined.reset_index(drop=True).to_csv(\"top10_taxa_loadings_Axes_2_3.csv\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c91429a",
   "metadata": {},
   "source": [
    "<h4>Load sample metadata and join with ordination<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 456,
   "id": "648e44f3",
   "metadata": {},
   "outputs": [],
   "source": [
    "meta = pd.read_csv(\"metadata_2.csv\")\n",
    "meta = meta.rename(columns={\"#SampleID\": \"SampleID\"})\n",
    "meta = meta.set_index(\"SampleID\")\n",
    "samples_meta = samples.merge(meta, left_index=True, right_index=True)\n",
    "samples_meta.to_csv(\"ordination_with_metadata.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d4e5c22",
   "metadata": {},
   "source": [
    "<h2> Run pairwise comparisons with livestock as a categorical variable to figure out where the differences are so that I can choose how to color the plot. <h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd8ed010",
   "metadata": {},
   "source": [
    "<h3>Start with Axis 2, where community composition varied the most with livestock, then check Axes 3 and 1.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 457,
   "id": "2cfadc72",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples <- read.csv(\"ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "samples$Livestock_avg_of_avgs <- as.factor(samples$Livestock_avg_of_avgs)\n",
    "\n",
    "pairwise.wilcox.test(samples$Axis2, samples$Livestock_avg_of_avgs, p.adjust.method = \"BH\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 458,
   "id": "b1464707",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples <- read.csv(\"ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "samples$Livestock_avg_of_avgs <- as.factor(samples$Livestock_avg_of_avgs)\n",
    "\n",
    "pairwise.wilcox.test(samples$Axis3, samples$Livestock_avg_of_avgs, p.adjust.method = \"BH\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 459,
   "id": "68e7e863",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples <- read.csv(\"ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "samples$Livestock_avg_of_avgs <- as.factor(samples$Livestock_avg_of_avgs)\n",
    "\n",
    "pairwise.wilcox.test(samples$Axis1, samples$Livestock_avg_of_avgs, p.adjust.method = \"BH\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44a6a5e6",
   "metadata": {},
   "source": [
    "<h2> It looks like most differences are between <~200 livestock and >~800 on the axes that show a correlation with livestock #'s. Rearrange into categories and do another pairwise test, given the livestock count numbers are not exact. <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 460,
   "id": "a78db0b8",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(dplyr)\n",
    "\n",
    "samples <- read.csv(\"ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "# Create new column based on condition\n",
    "df <- samples %>%\n",
    "  mutate(Livestock_avg_of_avgs = ifelse(Livestock_avg_of_avgs >= 776, \"many\", \"some\"))\n",
    "\n",
    "print(df$Livestock_avg_of_avgs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 461,
   "id": "3b433a19",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pairwise.wilcox.test(df$Axis2, df$Livestock_avg_of_avgs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e93844e6",
   "metadata": {},
   "source": [
    "<h2> There seems to be a reliable difference between the \"some\" and \"many\" categories. <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 462,
   "id": "4f2ee693",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pairwise.wilcox.test(df$Axis1, df$Livestock_avg_of_avgs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83caa943",
   "metadata": {},
   "source": [
    "<h2> The same is not true of Axis 1, as expected give the results of the Bayesian Model. <h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 463,
   "id": "6376f00c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "pairwise.wilcox.test(df$Axis3, df$Livestock_avg_of_avgs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a630b279",
   "metadata": {},
   "source": [
    "<h2>The difference along the third axis is apparent with the \"some\"/\"many\" separation.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeae0e36",
   "metadata": {},
   "source": [
    "<h2>Visualize with a biplot, coloring points with respect to livestock axis according to pairwise comparisons and given that methanobrevibacter is present as one of the top 10<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b41290fe",
   "metadata": {},
   "source": [
    "<h4>Install needed package and then plot<h4>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 464,
   "id": "371b63a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %%R\n",
    "\n",
    "# install.packages(\"ggrepel\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 465,
   "id": "8548367d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(ggrepel)\n",
    "library(RColorBrewer)\n",
    "library(dplyr)\n",
    "\n",
    "# Load sample scores\n",
    "samples <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "samples$Livestock_cat <- cut(samples$Livestock_avg_of_avgs,\n",
    "                             breaks = c(-1, 200, Inf),\n",
    "                             labels = c(\"Some\", \"Many\"))\n",
    "\n",
    "# Load top taxa vectors\n",
    "taxa <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/top10_taxa_loadings_Axes_2_3.csv\", header=TRUE)\n",
    "\n",
    "# Simplify taxonomy names\n",
    "simplify_taxonomy <- function(taxon_string) {\n",
    "  ranks <- unlist(strsplit(taxon_string, \";\\\\s*\"))\n",
    "  ranks <- rev(ranks)\n",
    "  for (rank in ranks) {\n",
    "    name <- sub(\"^[a-z]__*\", \"\", rank)\n",
    "    if (nchar(name) > 0 && name != \"Unassigned\") {\n",
    "      return(name)\n",
    "    }\n",
    "  }\n",
    "  return(\"Unclassified\")\n",
    "}\n",
    "\n",
    "taxa <- taxa %>%\n",
    "  mutate(SimpleName = sapply(Taxon, simplify_taxonomy))\n",
    "\n",
    "# Add helper column for direction\n",
    "taxa <- taxa %>%\n",
    "  mutate(Direction = ifelse(Axis2 < 0, \"Bottom\", \"Top\"))\n",
    "\n",
    "# Create label positions for both bottom and top taxa\n",
    "bottom_taxa <- taxa %>%\n",
    "  filter(Direction == \"Bottom\") %>%\n",
    "  arrange(Axis2) %>%\n",
    "  mutate(y_label = seq(min(Axis2) - 0.02,\n",
    "                       min(Axis2) - 0.02 - 0.07 * (n() - 1),\n",
    "                       by = -0.07))\n",
    "\n",
    "top_taxa <- taxa %>%\n",
    "  filter(Direction == \"Top\") %>%\n",
    "  arrange(desc(Axis2)) %>%\n",
    "  mutate(y_label = seq(max(Axis2) + 0.02,\n",
    "                       max(Axis2) + 0.02 + 0.07 * (n() - 1),\n",
    "                       by = 0.07))\n",
    "\n",
    "#Unified label formatting and size\n",
    "label_size <- 5.5  # CHANGED: increase label size for all\n",
    "\n",
    "# Define colors\n",
    "custom_colors <- brewer.pal(n = 8, name = \"Dark2\")[c(8, 6)]\n",
    "\n",
    "# Plot\n",
    "ggplot(samples, aes(x = Axis3, y = Axis2)) +\n",
    "  geom_point(aes(color = Livestock_cat), size = 3, alpha = 0.7) +\n",
    "  \n",
    "  # Arrows\n",
    "  geom_segment(data = taxa,\n",
    "               aes(x = 0, y = 0, xend = Axis3, yend = Axis2),\n",
    "               arrow = arrow(length = unit(0.2, \"cm\")),\n",
    "               color = \"blue4\", alpha = 0.8) +\n",
    "  \n",
    "  # CHANGED: top taxa labels\n",
    "  geom_text(data = top_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = top_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +  # CHANGED\n",
    "  \n",
    "  # Bottom taxa (same style as top)\n",
    "  geom_text(data = bottom_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  # CHANGED\n",
    "  \n",
    "  geom_segment(data = bottom_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +\n",
    "  \n",
    "  # Ellipses and theming\n",
    "  stat_ellipse(aes(fill = Livestock_cat, group = Livestock_cat),\n",
    "               geom = \"polygon\", alpha = 0.4, color = NA) +\n",
    "  scale_color_manual(values = custom_colors, name = \"Livestock in Reserves\") +\n",
    "  scale_fill_manual(values = custom_colors, name = \"Livestock in Reserves\") +\n",
    "  labs(x = \"Axis 3 (15.2% variance explained)\",\n",
    "       y = \"Axis 2 (28.6% variance explained)\") +\n",
    "   scale_x_continuous(limits = c(NA, 0.6)) +\n",
    "  scale_y_continuous(limits=c(-.7,0.55)) +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  theme(legend.position = \"right\") +\n",
    "  theme(axis.title = element_text(size = 18),       # Axis titles\n",
    "    axis.text = element_text(size = 16),        # Axis tick labels\n",
    "    legend.title = element_text(size = 18),     # Legend title\n",
    "    legend.text = element_text(size = 16)   )    # Legend text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3edd731a",
   "metadata": {},
   "source": [
    "<h2>Make other plots visualizing orientations according to metadata, starting with merging the ordination data with the metadata. Axes are arranged such that the axis along which the variable showed the largest effect is on the y-axis, and the second-largest effect is on the x-axis.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 466,
   "id": "c521af87",
   "metadata": {},
   "outputs": [],
   "source": [
    "meta = pd.read_csv(\"metadata_2.csv\")\n",
    "meta = meta.rename(columns={\"#SampleID\": \"SampleID\"})\n",
    "meta = meta.set_index(\"SampleID\")\n",
    "samples_meta = samples.merge(meta, left_index=True, right_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 467,
   "id": "b33fe3e7",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(ggrepel)\n",
    "library(RColorBrewer)\n",
    "library(dplyr)\n",
    "\n",
    "# Load data\n",
    "samples <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "head(samples)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec639e50",
   "metadata": {},
   "source": [
    "<h2>Livestock<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 468,
   "id": "26831fdb",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "\n",
    "samples$Livestock_cat <- cut(samples$Livestock_avg_of_avgs,\n",
    "                             breaks = c(-1, 200, Inf),\n",
    "                             labels = c(\"Some\",\"Many\"))\n",
    "\n",
    "custom_colors <- brewer.pal(n = 8, name = \"Dark2\")[c(8, 6)]  \n",
    "\n",
    "ggplot(samples, aes(x = Axis3, y = Axis2, color = Livestock_cat)) +\n",
    "  geom_point(size = 3, alpha = 0.7) +\n",
    "  stat_ellipse(aes(fill = Livestock_cat, group = Livestock_cat),\n",
    "               geom = \"polygon\", alpha = 0.4, color = NA) +\n",
    "  scale_color_manual(values = custom_colors, name = \"Livestock in Reserves\") +\n",
    "  scale_fill_manual(values = custom_colors, name = \"Livestock in Reserves\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 3 (15.2% variance explained)\",\n",
    "       y = \"Axis 2(28.6% variance explained)\") +\n",
    "  theme(legend.position = \"right\") +\n",
    "  theme(axis.title = element_text(size = 18),       # Axis titles\n",
    "        axis.text = element_text(size = 16),        # Axis tick labels\n",
    "        legend.title = element_text(size = 18),     # Legend title\n",
    "        legend.text = element_text(size = 16)   )    # Legend text"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "927e2f43",
   "metadata": {},
   "source": [
    "<h2>Grazing<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 469,
   "id": "581a050a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "\n",
    "custom_colors_1 <- brewer.pal(n = 8, name = \"Dark2\")[c(4,5)]  \n",
    "\n",
    "samples$Diet <- ifelse(samples$Grazing == 1, \"grazing\", \"browsing\")\n",
    "\n",
    "ggplot(samples, aes(x = Axis1, y = Axis2,  color = Diet)) +\n",
    "  geom_point(size = 3, alpha = 0.7) +\n",
    "  stat_ellipse(aes(fill = Diet, group = Diet),\n",
    "               geom = \"polygon\", alpha = 0.4, color = NA) +\n",
    "  scale_color_manual(values = custom_colors_1, name = \"Diet\") +\n",
    "  scale_fill_manual(values = custom_colors_1, name = \"Diet\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 1 (56.2% of variation explained)\",\n",
    "       y = \"Axis 2 (28.6% of variation explained)\") +\n",
    "  theme(legend.position = \"right\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2660181c",
   "metadata": {},
   "source": [
    "<h2>Pattern harder to see, likely because of other factors' influence not visible in the biplot.<h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cb4ae93",
   "metadata": {},
   "source": [
    "<h3>Check with axes 1 and 2 top taxa.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 470,
   "id": "b15352e9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "axes = [\"Axis1\", \"Axis2\", \"Axis3\"]\n",
    "top_taxa = features_tax[axes + [\"Taxon\"]].copy()\n",
    "\n",
    "#Create absolute loadings for both axes\n",
    "top_taxa[\"abs_axis1\"] = top_taxa[\"Axis1\"].abs()  # NEW\n",
    "top_taxa[\"abs_axis2\"] = top_taxa[\"Axis2\"].abs()  # NEW\n",
    "\n",
    "#Get top 10 by Axis2\n",
    "top10_axis1 = top_taxa.sort_values(\"abs_axis1\", ascending=False).head(10)\n",
    "\n",
    "#Get top 10 by Axis3\n",
    "top10_axis2 = top_taxa.sort_values(\"abs_axis2\", ascending=False).head(10)\n",
    "\n",
    "#Combine and remove duplicates\n",
    "top10_combined_1_and_2 = pd.concat([top10_axis1, top10_axis2]).drop_duplicates(keep=\"first\")\n",
    "\n",
    "print(top10_combined_1_and_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e9fb35f",
   "metadata": {},
   "source": [
    "<h3>Export Axis 1 and 2 top taxa for use in R.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 471,
   "id": "0fed7a62",
   "metadata": {},
   "outputs": [],
   "source": [
    "top10_combined_1_and_2.reset_index(drop=True).to_csv(\"top10_taxa_loadings_Axes_1_2.csv\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 472,
   "id": "a8a8a987",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(ggrepel)\n",
    "library(RColorBrewer)\n",
    "\n",
    "# Load sample scores\n",
    "samples <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/ordination_with_metadata.csv\", header=TRUE)\n",
    "\n",
    "head(samples)\n",
    "\n",
    "samples$Diet <- ifelse(samples$Grazing == 1, \"increased grazing\", \"more browsing\")\n",
    "\n",
    "# Load top taxa vectors\n",
    "taxa <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/top10_taxa_loadings_Axes_1_2.csv\", header=TRUE)\n",
    "head(taxa)\n",
    "\n",
    "# Helper function to simplify taxonomy\n",
    "simplify_taxonomy <- function(taxon_string) {\n",
    "  ranks <- unlist(strsplit(taxon_string, \";\\\\s*\"))\n",
    "  ranks <- rev(ranks)  # Start from most specific\n",
    "  for (rank in ranks) {\n",
    "    name <- sub(\"^[a-z]__*\", \"\", rank)\n",
    "    if (nchar(name) > 0 && name != \"Unassigned\") {\n",
    "      return(name)\n",
    "    }\n",
    "  }\n",
    "  return(\"Unclassified\")\n",
    "}\n",
    "\n",
    "taxa <- taxa %>%\n",
    "  mutate(SimpleName = sapply(Taxon, simplify_taxonomy))\n",
    "\n",
    "\n",
    "# Add helper column to tag arrows pointing downward\n",
    "taxa <- taxa %>%\n",
    "  mutate(Direction = ifelse(Axis2 < 0, \"Bottom\", \"Top\"))\n",
    "\n",
    "# Create label positions for both bottom and top taxa\n",
    "bottom_taxa <- taxa %>%\n",
    "  filter(Direction == \"Bottom\") %>%\n",
    "  arrange(Axis2) %>%\n",
    "  mutate(y_label = seq(min(Axis2) - 0.02,\n",
    "                       min(Axis2) - 0.02 - 0.07 * (n() - 1),\n",
    "                       by = -0.07))\n",
    "\n",
    "top_taxa <- taxa %>%\n",
    "  filter(Direction == \"Top\") %>%\n",
    "  arrange(desc(Axis2)) %>%\n",
    "  mutate(y_label = seq(max(Axis2) + 0.02,\n",
    "                       max(Axis2) + 0.02 + 0.07 * (n() - 1),\n",
    "                       by = 0.07))\n",
    "\n",
    "#Unified label formatting and size\n",
    "label_size <- 5  \n",
    "\n",
    "#Define colors\n",
    "custom_colors_1 <- brewer.pal(n = 8, name = \"Dark2\")[c(4,5)]   \n",
    "\n",
    "# Plot\n",
    "ggplot(samples, aes(x = Axis1, y = Axis2)) +\n",
    "  geom_point(aes(fill = Diet), shape = 21, size = 3, alpha = 0.7) +\n",
    "  \n",
    "  # Arrows\n",
    "  geom_segment(data = taxa,\n",
    "               aes(x = 0, y = 0, xend = Axis1, yend = Axis2),\n",
    "               arrow = arrow(length = unit(0.2, \"cm\")),\n",
    "               color = \"blue4\", alpha = 0.8) +\n",
    "  \n",
    "  # Top taxa labels\n",
    "  geom_text(data = top_taxa,\n",
    "            aes(x = Axis1, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = top_taxa,\n",
    "               aes(x = Axis1, y = Axis2, xend = Axis1, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +  \n",
    "  \n",
    "  # Bottom taxa (same style as top)\n",
    "  geom_text(data = bottom_taxa,\n",
    "            aes(x = Axis1, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = bottom_taxa,\n",
    "               aes(x = Axis1, y = Axis2, xend = Axis1, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +\n",
    "  \n",
    "  # Ellipses and theming\n",
    "  stat_ellipse(aes(fill = Diet, group = Diet),\n",
    "               geom = \"polygon\", alpha = 0.4, color = NA) +\n",
    "  scale_color_manual(values = custom_colors_1, name = \"\") +\n",
    "  scale_fill_manual(values = custom_colors_1, name = \"Diet\") +\n",
    "  labs(x = \"Axis 1 (56.2% variance explained)\",\n",
    "       y = \"Axis 2 (28.6% variance explained)\") +\n",
    "  scale_x_continuous(limits = c(-.4, 0.6)) +\n",
    "  scale_y_continuous(limits=c(-1.1,0.55)) +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  theme(legend.position = \"right\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b485353",
   "metadata": {},
   "source": [
    "<h3>Pattern according to those taxa not very apparent.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54642ab2",
   "metadata": {},
   "source": [
    "<h2>Plate<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 473,
   "id": "4f1a393b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples$Plate <- as.factor(samples$Plate)\n",
    "\n",
    "custom_colors_2 <- brewer.pal(n = 11, name = \"BrBG\")[c(1,3,9,11)]  \n",
    "\n",
    "\n",
    "ggplot(samples, aes(x = Axis1, y = Axis3,  color = Plate)) +\n",
    "  geom_point(size = 3, alpha = 0.7) +\n",
    "  stat_ellipse(aes(fill = Plate, group = Plate),\n",
    "               geom = \"polygon\", alpha = 0.4, color=NA) +\n",
    "  scale_color_manual(values = custom_colors_2, name = \"Plate\") +\n",
    "  scale_fill_manual(values = custom_colors_2, name = \"Plate\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 1 (56.2% of variation explained)\",\n",
    "       y = \"Axis 3 (28.6% of variation explained)\") +\n",
    "  theme(legend.position = \"right\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fa8a2b0",
   "metadata": {},
   "source": [
    "<h3>Plate showed the most difference along Axes 3 and 1. Therefore, we will extract the top 10 taxa influencing Axis 3 and 1 and see if any patterns emerge.<h3>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1359a22a",
   "metadata": {},
   "source": [
    "<h3>Extract Axes 3 and 1 top taxa<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 474,
   "id": "41b09d55",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# axes = [\"Axis1\", \"Axis2\", \"Axis3\"]\n",
    "# top_taxa = features_tax[axes + [\"Taxon\"]].copy()\n",
    "\n",
    "# #Create absolute loadings for both axes\n",
    "# top_taxa[\"abs_axis3\"] = top_taxa[\"Axis3\"].abs() \n",
    "# top_taxa[\"abs_axis1\"] = top_taxa[\"Axis1\"].abs()\n",
    "\n",
    "# #Get top 10 by Axis2\n",
    "# top10_axis3 = top_taxa.sort_values(\"abs_axis3\", ascending=False).head(10)\n",
    "\n",
    "# #Get top 10 by Axis3\n",
    "# top10_axis1 = top_taxa.sort_values(\"abs_axis1\", ascending=False).head(10)\n",
    "\n",
    "# #Combine and remove duplicates\n",
    "# top10_combined_3_and_1 = pd.concat([top10_axis3, top10_axis1]).drop_duplicates(keep=\"first\")\n",
    "\n",
    "# print(top10_combined_3_and_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9f9933d",
   "metadata": {},
   "source": [
    "<h3>Save axes 3 and 1 top taxa to .csv for use in R<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 475,
   "id": "bf114648",
   "metadata": {},
   "outputs": [],
   "source": [
    "# top10_combined_3_and_1.reset_index(drop=True).to_csv(\"top10_taxa_loadings_Axes_3_1.csv\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 476,
   "id": "a0436b72",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples$Plate <- as.factor(samples$Plate)\n",
    "\n",
    "custom_colors_2 <- brewer.pal(n = 11, name = \"BrBG\")[c(1,3,9,11)] \n",
    "\n",
    "\n",
    "# Load top taxa vectors\n",
    "taxa <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/top10_taxa_loadings_Axes_3_1.csv\", header=TRUE)\n",
    "head(taxa)\n",
    "\n",
    "# Helper function to simplify taxonomy\n",
    "simplify_taxonomy <- function(taxon_string) {\n",
    "  ranks <- unlist(strsplit(taxon_string, \";\\\\s*\"))\n",
    "  ranks <- rev(ranks)  # Start from most specific\n",
    "  for (rank in ranks) {\n",
    "    name <- sub(\"^[a-z]__*\", \"\", rank)\n",
    "    if (nchar(name) > 0 && name != \"Unassigned\") {\n",
    "      return(name)\n",
    "    }\n",
    "  }\n",
    "  return(\"Unclassified\")\n",
    "}\n",
    "\n",
    "taxa <- taxa %>%\n",
    "  mutate(SimpleName = sapply(Taxon, simplify_taxonomy))\n",
    "\n",
    "# Add helper column to tag arrows pointing downward\n",
    "taxa <- taxa %>%\n",
    "  mutate(Direction = ifelse(Axis3 < 0, \"Bottom\", \"Top\"))\n",
    "\n",
    "# Create label positions for both bottom and top taxa\n",
    "bottom_taxa <- taxa %>%\n",
    "  filter(Direction == \"Bottom\") %>%\n",
    "  arrange(Axis3) %>%\n",
    "  mutate(y_label = seq(min(Axis3) - 0.02,\n",
    "                       min(Axis3) - 0.02 - 0.07 * (n() - 1),\n",
    "                       by = -0.07))\n",
    "\n",
    "top_taxa <- taxa %>%\n",
    "  filter(Direction == \"Top\") %>%\n",
    "  arrange(desc(Axis3)) %>%\n",
    "  mutate(y_label = seq(max(Axis3) + 0.02,\n",
    "                       max(Axis3) + 0.02 + 0.07 * (n() - 1),\n",
    "                       by = 0.07))\n",
    "\n",
    "#Unified label formatting and size\n",
    "label_size <- 5  \n",
    "\n",
    "#Plot\n",
    "\n",
    "\n",
    "\n",
    "ggplot(samples, aes(x = Axis1, y = Axis3,  color = Plate)) +\n",
    "  geom_point(size = 3, alpha = 0.7) +\n",
    "\n",
    "# Arrows\n",
    "  geom_segment(data = taxa,\n",
    "               aes(x = 0, y = 0, xend = Axis1, yend = Axis3),\n",
    "               arrow = arrow(length = unit(0.2, \"cm\")),\n",
    "               color = \"blue4\", alpha = 0.8) +\n",
    "  \n",
    "  # Top taxa labels\n",
    "  geom_text(data = top_taxa,\n",
    "            aes(x = Axis1, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = top_taxa,\n",
    "               aes(x = Axis1, y = Axis3, xend = Axis1, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +  \n",
    "  \n",
    "  # Bottom taxa (same style as top)\n",
    "  geom_text(data = bottom_taxa,\n",
    "            aes(x = Axis1, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = bottom_taxa,\n",
    "               aes(x = Axis1, y = Axis3, xend = Axis1, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +\n",
    "  stat_ellipse(aes(fill = Plate, group = Plate),\n",
    "               geom = \"polygon\", alpha = 0.4, color=NA) +\n",
    "  scale_color_manual(values = custom_colors_2, name = \"Plate\") +\n",
    "  scale_fill_manual(values = custom_colors_2, name = \"Plate\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 1 (56.2% of variation explained)\",\n",
    "       y = \"Axis 3 (28.6% of variation explained)\") +\n",
    "  theme(legend.position = \"right\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26bc0eb6",
   "metadata": {},
   "source": [
    "<h2>Elephant ID, using the eight elephants with the most samples<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 477,
   "id": "dccc0ca0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples$Elephant_ID <- as.factor(samples$Elephant_ID)\n",
    "\n",
    "# Count how many samples each Elephant_ID has\n",
    "top_individuals <- samples %>%\n",
    "  count(Elephant_ID, sort = TRUE) %>%\n",
    "  slice_max(n, n = 7)   # Top 5 individuals by sample count (change number if needed)\n",
    "\n",
    "# Filter the original samples data frame to include only those individuals\n",
    "samples_top <- samples %>%\n",
    "  filter(Elephant_ID %in% top_individuals$Elephant_ID)\n",
    "\n",
    "# Now plot only those individuals\n",
    "ggplot(samples_top, aes(x = Axis3, y = Axis2)) +\n",
    "  geom_point(size = 3, alpha = 0.7, aes(color = Elephant_ID)) +\n",
    "  stat_ellipse(aes(fill = Elephant_ID, group = Elephant_ID),\n",
    "               geom = \"polygon\", alpha = 0.2, color = NA) +\n",
    "  scale_color_brewer(palette = \"Accent\") +\n",
    "  scale_fill_brewer(palette = \"Accent\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 3 (15.2% of variation explained)\", y = \"Axis 2 (28.6% of variation explained)\") +\n",
    "  theme(legend.position = \"right\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b64141f",
   "metadata": {},
   "source": [
    "<h3>Add in top 10 taxa<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 478,
   "id": "66e3191d",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(ggrepel)\n",
    "library(RColorBrewer)\n",
    "\n",
    "samples$Elephant_ID <- as.factor(samples$Elephant_ID)\n",
    "\n",
    "# Count how many samples each Elephant_ID has\n",
    "top_individuals <- samples %>%\n",
    "  count(Elephant_ID, sort = TRUE) %>%\n",
    "  slice_max(n, n = 7)   # Top 5 individuals by sample count (change number if needed)\n",
    "\n",
    "# Filter the original samples data frame to include only those individuals\n",
    "samples_top <- samples %>%\n",
    "  filter(Elephant_ID %in% top_individuals$Elephant_ID)\n",
    "\n",
    "# Load top taxa vectors\n",
    "taxa <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/top10_taxa_loadings_Axes_2_3.csv\", header=TRUE)\n",
    "\n",
    "# Simplify taxonomy names\n",
    "simplify_taxonomy <- function(taxon_string) {\n",
    "  ranks <- unlist(strsplit(taxon_string, \";\\\\s*\"))\n",
    "  ranks <- rev(ranks)\n",
    "  for (rank in ranks) {\n",
    "    name <- sub(\"^[a-z]__*\", \"\", rank)\n",
    "    if (nchar(name) > 0 && name != \"Unassigned\") {\n",
    "      return(name)\n",
    "    }\n",
    "  }\n",
    "  return(\"Unclassified\")\n",
    "}\n",
    "\n",
    "taxa <- taxa %>%\n",
    "  mutate(SimpleName = sapply(Taxon, simplify_taxonomy))\n",
    "\n",
    "# Add helper column for direction\n",
    "taxa <- taxa %>%\n",
    "  mutate(Direction = ifelse(Axis2 < 0, \"Bottom\", \"Top\"))\n",
    "\n",
    "# Create label positions for both bottom and top taxa\n",
    "bottom_taxa <- taxa %>%\n",
    "  filter(Direction == \"Bottom\") %>%\n",
    "  arrange(Axis2) %>%\n",
    "  mutate(y_label = seq(min(Axis2) - 0.02,\n",
    "                       min(Axis2) - 0.02 - 0.07 * (n() - 1),\n",
    "                       by = -0.07))\n",
    "\n",
    "top_taxa <- taxa %>%\n",
    "  filter(Direction == \"Top\") %>%\n",
    "  arrange(desc(Axis2)) %>%\n",
    "  mutate(y_label = seq(max(Axis2) + 0.02,\n",
    "                       max(Axis2) + 0.02 + 0.07 * (n() - 1),\n",
    "                       by = 0.07))\n",
    "\n",
    "#Unified label formatting and size\n",
    "label_size <- 5  \n",
    "\n",
    "\n",
    "# Now plot the individuals with the top ten taxa for axes 2 and 3\n",
    "ggplot(samples_top, aes(x = Axis3, y = Axis2)) +\n",
    "  geom_point(size = 3, alpha = 0.7, aes(color = Elephant_ID)) +\n",
    "  \n",
    "  # Arrows\n",
    "  geom_segment(data = taxa,\n",
    "               aes(x = 0, y = 0, xend = Axis3, yend = Axis2),\n",
    "               arrow = arrow(length = unit(0.2, \"cm\")),\n",
    "               color = \"blue4\", alpha = 0.8) +\n",
    "  \n",
    "  # Top taxa labels\n",
    "  geom_text(data = top_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = top_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +  \n",
    "  \n",
    "  # Bottom taxa (same style as top)\n",
    "  geom_text(data = bottom_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = bottom_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +\n",
    "  \n",
    "  stat_ellipse(aes(fill = Elephant_ID, group = Elephant_ID),\n",
    "               geom = \"polygon\", alpha = 0.2, color = NA) +\n",
    "  scale_color_brewer(palette = \"Accent\") +\n",
    "  scale_fill_brewer(palette = \"Accent\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 3 (15.2% of variation explained)\", y = \"Axis 2 (28.6% of variation explained)\") +\n",
    "  scale_x_continuous(limits = c(NA, 0.6)) +\n",
    "  scale_y_continuous(limits=c(-.7,0.55)) +\n",
    "  theme(legend.position = \"right\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3495645",
   "metadata": {},
   "source": [
    "<h2>Time<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 479,
   "id": "cb9f18ac",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples$Date.Sampled <- as.Date(samples$Date.Sampled,format=\"%m/%d/%y\")\n",
    "samples$Time <- as.numeric(samples$Date.Sampled)\n",
    "\n",
    "samples$Time <- cut(samples$Time,\n",
    "                    breaks = 5,  # Split into 5 equal-width bins\n",
    "                    labels = c(\"Start\", \"Early\", \"Mid\", \"Late\", \"End\"),\n",
    "                    ordered_result = TRUE)\n",
    "\n",
    "\n",
    "ggplot(samples, aes(x = Axis3, y = Axis2,  color = Time)) +\n",
    "  geom_point(size = 3, alpha = 0.7) +\n",
    "  stat_ellipse(aes(fill = Time, group = Time),\n",
    "               geom = \"polygon\", alpha = 0.4, color=NA) +\n",
    "  scale_color_brewer(palette = \"PRGn\") +\n",
    "  scale_fill_brewer(palette = \"PRGn\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 3 (15.2% of variation explained)\",\n",
    "       y = \"Axis 2 (28.6 of variation explained)\") +\n",
    "  theme(legend.position = \"right\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26952c40",
   "metadata": {},
   "source": [
    "<h3>Add in top 10 taxa again.<h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 480,
   "id": "e12c0601",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "samples$Date.Sampled <- as.Date(samples$Date.Sampled,format=\"%m/%d/%y\")\n",
    "samples$Time <- as.numeric(samples$Date.Sampled)\n",
    "\n",
    "samples$Time <- cut(samples$Time,\n",
    "                    breaks = 5,  # Split into 5 equal-width bins\n",
    "                    labels = c(\"Start\", \"Early\", \"Mid\", \"Late\", \"End\"),\n",
    "                    ordered_result = TRUE)\n",
    "\n",
    "# Load top taxa vectors\n",
    "taxa <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/top10_taxa_loadings_Axes_2_3.csv\", header=TRUE)\n",
    "\n",
    "# Simplify taxonomy names\n",
    "simplify_taxonomy <- function(taxon_string) {\n",
    "  ranks <- unlist(strsplit(taxon_string, \";\\\\s*\"))\n",
    "  ranks <- rev(ranks)\n",
    "  for (rank in ranks) {\n",
    "    name <- sub(\"^[a-z]__*\", \"\", rank)\n",
    "    if (nchar(name) > 0 && name != \"Unassigned\") {\n",
    "      return(name)\n",
    "    }\n",
    "  }\n",
    "  return(\"Unclassified\")\n",
    "}\n",
    "\n",
    "taxa <- taxa %>%\n",
    "  mutate(SimpleName = sapply(Taxon, simplify_taxonomy))\n",
    "\n",
    "# Add helper column for direction\n",
    "taxa <- taxa %>%\n",
    "  mutate(Direction = ifelse(Axis2 < 0, \"Bottom\", \"Top\"))\n",
    "\n",
    "# Create label positions for both bottom and top taxa\n",
    "bottom_taxa <- taxa %>%\n",
    "  filter(Direction == \"Bottom\") %>%\n",
    "  arrange(Axis2) %>%\n",
    "  mutate(y_label = seq(min(Axis2) - 0.02,\n",
    "                       min(Axis2) - 0.02 - 0.07 * (n() - 1),\n",
    "                       by = -0.07))\n",
    "\n",
    "top_taxa <- taxa %>%\n",
    "  filter(Direction == \"Top\") %>%\n",
    "  arrange(desc(Axis2)) %>%\n",
    "  mutate(y_label = seq(max(Axis2) + 0.02,\n",
    "                       max(Axis2) + 0.02 + 0.07 * (n() - 1),\n",
    "                       by = 0.07))\n",
    "\n",
    "#Unified label formatting and size\n",
    "label_size <- 5  \n",
    "\n",
    "#Plot\n",
    "\n",
    "ggplot(samples, aes(x = Axis3, y = Axis2)) +\n",
    "  geom_point(aes(color = Time), size = 3, alpha = 0.7) +\n",
    "  \n",
    "  # Arrows\n",
    "  geom_segment(data = taxa,\n",
    "               aes(x = 0, y = 0, xend = Axis3, yend = Axis2),\n",
    "               arrow = arrow(length = unit(0.2, \"cm\")),\n",
    "               color = \"blue4\", alpha = 0.8) +\n",
    "  \n",
    "  # Top taxa labels\n",
    "  geom_text(data = top_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = top_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +  \n",
    "  \n",
    "  # Bottom taxa (same style as top)\n",
    "  geom_text(data = bottom_taxa,\n",
    "            aes(x = Axis3, y = y_label, label = SimpleName),\n",
    "            size = label_size, color = \"gray10\", hjust = 0) +  \n",
    "  \n",
    "  geom_segment(data = bottom_taxa,\n",
    "               aes(x = Axis3, y = Axis2, xend = Axis3, yend = y_label),\n",
    "               color = \"blue4\", linetype = \"dashed\", linewidth = 0.25) +\n",
    "  \n",
    "  # Ellipses by time\n",
    "  stat_ellipse(aes(fill = Time, group = Time),\n",
    "               geom = \"polygon\", alpha = 0.4, color = NA) +\n",
    "  \n",
    "  # Scales and themes\n",
    "  scale_color_brewer(palette = \"PRGn\") +\n",
    "  scale_fill_brewer(palette = \"PRGn\") +\n",
    "  theme_minimal(base_size = 14) +\n",
    "  labs(x = \"Axis 3 (15.2% variance explained)\",\n",
    "       y = \"Axis 2 (28.6% variance explained)\",\n",
    "       color = \"Time\", fill = \"Time\") +\n",
    "  scale_x_continuous(limits = c(NA, 0.6)) +\n",
    "  scale_y_continuous(limits=c(-.7,0.55)) +\n",
    "  theme(legend.position = \"right\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b23b43d3",
   "metadata": {},
   "source": [
    "<h1>New Figure showing timing of sampling per individual, suggested by a reviewer<h1>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 481,
   "id": "061cfad9",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "library(ggplot2)\n",
    "library(readr)\n",
    "\n",
    "data <- read_csv(\"metadata_2.csv\")\n",
    "\n",
    "data$Date.Sampled <- as.Date(data$Date.Sampled, format=\"%m/%d/%y\")\n",
    "\n",
    "start_date <- as.Date(\"2015-06-01\") \n",
    "\n",
    "ggplot(data, aes(x = Date.Sampled, y = Elephant_ID)) +\n",
    "  geom_point() +\n",
    "  theme_minimal() +\n",
    "  labs(title = \"Sample Collection Over Time\",\n",
    "       x = \"Time\",\n",
    "       y = \"Individual\") +\n",
    "  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +\n",
    " scale_x_date(\n",
    "     breaks = seq(start_date, max(data$Date.Sampled), by = \"month\"),\n",
    "      labels = scales::date_format(\"%b %d %y\") \n",
    "      )    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71a42bb3",
   "metadata": {},
   "source": [
    "<h2>The below is code for the relative abundance figure requested by a reviewer.<h2>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "031be494",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%R\n",
    "\n",
    "#Stacked barplot generation with microbiome data\n",
    "#https://www.youtube.com/watch?v=siIoupAnILk\n",
    "\n",
    "#Load tidyverse and ggplot2 packages\n",
    "library(tidyverse)\n",
    "library(ggplot2)\n",
    "\n",
    "#Load asv table at phylum level\n",
    "data <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/level-2.csv\")\n",
    "\n",
    "#Make data into a data frame\n",
    "data=data.frame(data)\n",
    "\n",
    "#Remove metadata variables from the data frame that we do not need at this time\n",
    "data = select(data, -c(Plate,Row,Column,Elephant_ID,Date.Sampled,State,Sample_day,Sample_month_progression,Sample_year,Sex,Estimated.Birthdate,Age.at.Sampling,Age.at.Sampling.stdz,AgeClass,Family,With_family,Orphan_status,Livestock_avg_of_avgs,Livestock_avg_of_avgs_stdz,Livestock_categorical,Livestock_avgs,Livestock_avgs_stdz,Cort,Pregnant,Lactating,Strongylid_Epg,NDVI,NDVI_stdz,Season,Time.Dropped,Time.Collected,Time.on.Ground,Known.Storage.Method,Possible.storage.method.1,Possible.storage.method.2,Known.Sample.weight,Sample.weight.if.method.1,Sample.weight.if.method.2,Notes,Condition.Notes,Alpha.Diversity,NDVI_stdDev,NDVI_stdDev_stdz))\n",
    "\n",
    "#Rename columns with phyla to be phylum name alone\n",
    "data <- data %>%\n",
    "  rename(\n",
    "    Firmicutes = d__Bacteria.p__Firmicutes,\n",
    "    Thermoplasmatota = d__Archaea.p__Thermoplasmatota,\n",
    "    Bacteroidota = d__Bacteria.p__Bacteroidota,\n",
    "    Verrucomicrobiota = d__Bacteria.p__Verrucomicrobiota,\n",
    "    Myxococcota = d__Bacteria.p__Myxococcota,\n",
    "    Spirochaetota = d__Bacteria.p__Spirochaetota,\n",
    "    Armatimonadota = d__Bacteria.p__Armatimonadota,\n",
    "    Proteobacteria = d__Bacteria.p__Proteobacteria,\n",
    "    Halobacterota = d__Archaea.p__Halobacterota,\n",
    "    Patescibacteria = d__Bacteria.p__Patescibacteria,\n",
    "    Euryarchaeota = d__Archaea.p__Euryarchaeota,\n",
    "    Fibrobacterota = d__Bacteria.p__Fibrobacterota,\n",
    "    Bdellovibrionota = d__Bacteria.p__Bdellovibrionota,\n",
    "    Cyanobacteria = d__Bacteria.p__Cyanobacteria,\n",
    "    Desulfobacterota = d__Bacteria.p__Desulfobacterota,\n",
    "    Planctomycetota = d__Bacteria.p__Planctomycetota,\n",
    "    Synergistota = d__Bacteria.p__Synergistota,\n",
    "    Elusimicrobiota = d__Bacteria.p__Elusimicrobiota,\n",
    "    SAR324_clade.Marine_group_B = d__Bacteria.p__SAR324_clade.Marine_group_B.,\n",
    "    Chloroflexi = d__Bacteria.p__Chloroflexi,\n",
    "    Crenarchaeota = d__Archaea.p__Crenarchaeota,\n",
    "    Fusobacteriota = d__Bacteria.p__Fusobacteriota,\n",
    "    Actinobacteriota = d__Bacteria.p__Actinobacteriota,\n",
    "    Methylomirabilota = d__Bacteria.p__Methylomirabilota,\n",
    "    Parabasalia = d__Eukaryota.p__Parabasalia,\n",
    "    Deinococcota = d__Bacteria.p__Deinococcota,\n",
    "    Campilobacterota = d__Bacteria.p__Campilobacterota,\n",
    "    Sumerlaeota = d__Bacteria.p__Sumerlaeota,\n",
    "    Gemmatimonadota = d__Bacteria.p__Gemmatimonadota,\n",
    "    WPS.2 = d__Bacteria.p__WPS.2,\n",
    "    Deferribacterota = d__Bacteria.p__Deferribacterota,\n",
    "    NB1.j = d__Bacteria.p__NB1.j,\n",
    "    Acidobacteriota = d__Bacteria.p__Acidobacteriota \n",
    "  )\n",
    "\n",
    "#Move matrix into a long format for ggplot2\n",
    "data <-data%>%\n",
    "  pivot_longer(-index, names_to = \"Phylum\", values_to = \"Count\")\n",
    "\n",
    "#Get other relevant metadata columns for plotting to rejoin with new long format data frame\n",
    "meta_cols <- read.csv(\"/Users/jparker/miniconda3/Jupyter-Notebook/Manuscript_1/level-2.csv\")\n",
    "\n",
    "#Make into data frame\n",
    "meta_cols <- as.data.frame(meta_cols)\n",
    "\n",
    "#Get rid of all columns except for those needed to plot\n",
    "meta_cols <- select(meta_cols, c(index,Elephant_ID,Family,Date.Sampled,AgeClass,Sex,Season,Age.at.Sampling))\n",
    "\n",
    "#Read Date.Sampled in as a date\n",
    "\n",
    "#Load lubridate, a package that tells R how to interpret dates easily\n",
    "library(lubridate) \n",
    "\n",
    "#Show R that Date.Sampled is in month/day/year format\n",
    "meta_cols <- meta_cols %>% \n",
    "  mutate(Date.Sampled = lubridate::mdy(Date.Sampled))\n",
    "\n",
    "#Add wanted metadata columns into the long format data frame\n",
    "data <- data %>%\n",
    "  left_join(meta_cols, by = \"index\")\n",
    "\n",
    "\n",
    "#Fix sex of M25.0012's first sample\n",
    "\n",
    "data <- arrange(data, Elephant_ID, Date.Sampled)\n",
    "\n",
    "data$Sex[859:891] = \"M\"\n",
    "\n",
    "#Reorder taxa based on relative abundance, using \"taxa-bar-plots-for-stacked-figure.qzv\" within jupyter notebook (qiime2 output)\n",
    "phyla_order <- c(\"Firmicutes\",\n",
    "                 \"Bacteroidota\",\n",
    "                 \"Euryarchaeota\",\n",
    "                 \"Actinobacteriota\",\n",
    "                 \"Proteobacteria\",\n",
    "                 \"Halobacterota\",\n",
    "                 \"Verrucomicrobiota\",\n",
    "                 \"Planctomycetota\",\n",
    "                 \"Spirochaetota\",\n",
    "                 \"Armatimonadota\",\n",
    "                 \"Synergistota\",\n",
    "                 \"Chloroflexi\",\n",
    "                 \"Fibrobacterota\",\n",
    "                 \"Desulfobacterota\",\n",
    "                 \"Thermoplasmatota\",\n",
    "                 \"Cyanobacteria\",\n",
    "                 \"Patescibacteria\",\n",
    "                 \"Bdellovibrionota\",\n",
    "                 \"Elusimicrobiota\",\n",
    "                 \"WPS.2\",\n",
    "                 \"Myxococcota\",\n",
    "                 \"SAR324_clade.Marine_group_B\",\n",
    "                 \"Parabasalia\",\n",
    "                 \"Fusobacteriota\",\n",
    "                 \"Acidobacteriota\",\n",
    "                 \"Campilobacterota\",\n",
    "                 \"Deferribacterota\",\n",
    "                 \"Gemmatimonadota\",\n",
    "                 \"Crenarchaeota\",\n",
    "                 \"Deinococcota\",\n",
    "                 \"Methylomirabilota\",\n",
    "                 \"NB1.j\",\n",
    "                 \"Sumerlaeota\")\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = factor(Phylum, levels=phyla_order))\n",
    "\n",
    "#Group less abundant groups together for the plot, found code at https://campus.datacamp.com/courses/categorical-data-in-the-tidyverse/manipulating-factor-variables?ex=8\n",
    "\n",
    "data <- data %>%\n",
    "  mutate(Phylum = fct_other(Phylum,\n",
    "                            drop = c( \"Chloroflexi\",\n",
    "                                      \"Fibrobacterota\",\n",
    "                                      \"Desulfobacterota\",\n",
    "                                      \"Thermoplasmatota\",\n",
    "                                      \"Cyanobacteria\",\n",
    "                                      \"Patescibacteria\",\n",
    "                                      \"Bdellovibrionota\",\n",
    "                                      \"Elusimicrobiota\",\n",
    "                                      \"WPS.2\",\n",
    "                                      \"Myxococcota\",\n",
    "                                      \"SAR324_clade.Marine_group_B\",\n",
    "                                      \"Parabasalia\",\n",
    "                                      \"Fusobacteriota\",\n",
    "                                      \"Acidobacteriota\",\n",
    "                                      \"Campilobacterota\",\n",
    "                                      \"Deferribacterota\",\n",
    "                                      \"Gemmatimonadota\",\n",
    "                                      \"Crenarchaeota\",\n",
    "                                      \"Deinococcota\",\n",
    "                                      \"Methylomirabilota\",\n",
    "                                      \"NB1.j\",\n",
    "                                      \"Sumerlaeota\")))\n",
    "\n",
    "# Sort data by sampling date\n",
    "data <- data %>%\n",
    "  arrange(Date.Sampled) %>%\n",
    "  mutate(sample_order = factor(index, levels = unique(index)))  # preserve sample order in plot\n",
    "\n",
    "# Load any additional packages\n",
    "library(scales)  # for percent axis labels\n",
    "library(ggplot2)\n",
    "\n",
    "# Create the unified stacked bar plot\n",
    "p_all <- ggplot(data, aes(x = sample_order, y = Count, fill = Phylum)) +\n",
    "  geom_bar(stat = \"identity\", position = \"fill\", width = .9) +  # thinner bars\n",
    "  scale_y_continuous(name = \"Relative abundance\", labels = percent_format(), expand = c(0, 0)) +\n",
    "  scale_x_discrete(name = \"Sample (ordered by date collected)\", breaks = NULL) +  # remove cluttered x-axis labels\n",
    "  theme_minimal(base_size = 10) +\n",
    "  theme(\n",
    "    axis.text.x = element_blank(),\n",
    "    axis.text.y = element_text(size=16),\n",
    "    axis.ticks.x = element_blank(),\n",
    "    panel.grid.major.x = element_blank(),\n",
    "    axis.title.y = element_text(size=18, face=\"bold\"),\n",
    "    axis.title.x = element_text(size=18, face=\"bold\"),\n",
    "    legend.text = element_text(size = 14),\n",
    "    legend.title = element_text(size=18, face=\"bold\"),\n",
    "    legend.position = \"right\"\n",
    "  ) +\n",
    "scale_fill_brewer(palette = \"Paired\")\n",
    "\n",
    "# View the plot\n",
    "print(p_all)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
